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Preface 


This  book  provides  a  modern,  self-contained  introduction  to  digital 
image  processing.  We  designed  the  book  to  be  used  both  by  learners 
desiring  a  firm  foundation  on  which  to  build  as  well  as  practitioners 
in  search  of  detailed  analysis  and  transparent  implementations  of  the 
most  important  techniques.  This  is  the  second  English  edition  of  the 
original  German-language  book,  which  has  been  widely  used  by: 

•  Scientists  and  engineers  who  use  image  processing  as  a  tool  and 
wish  to  develop  a  deeper  understanding  and  create  custom  solu¬ 
tions  to  imaging  problems  in  their  field. 

•  IT  professionals  wanting  a  self-study  course  featuring  easily 
adaptable  code  and  completely  worked  out  examples,  enabling 
them  to  be  productive  right  away. 

•  Faculty  and  students  desiring  an  example-rich  introductory  text¬ 
book  suitable  for  an  advanced  undergraduate  or  graduate  level 
course  that  features  exercises,  projects,  and  examples  that  have 
been  honed  during  our  years  of  experience  teaching  this  material. 

While  we  concentrate  on  practical  applications  and  concrete  imple¬ 
mentations,  we  do  so  without  glossing  over  the  important  formal 
details  and  mathematics  necessary  for  a  deeper  understanding  of  the 
algorithms.  In  preparing  this  text,  we  started  from  the  premise  that 
simply  creating  a  recipe  book  of  imaging  solutions  would  not  provide 
the  deeper  understanding  needed  to  apply  these  techniques  to  novel 
problems,  so  instead  our  solutions  are  developed  stepwise  from  three 
different  perspectives:  in  mathematical  form,  as  abstract  pseudocode 
algorithms,  and  as  complete  Java  programs.  We  use  a  common  no¬ 
tation  to  intertwine  all  three  perspectives — providing  multiple,  but 
linked,  views  of  the  problem  and  its  solution. 

Prerequisites 

Instead  of  presenting  digital  image  processing  as  a  mathematical  dis¬ 
cipline,  or  strictly  as  signal  processing,  we  present  it  from  a  practi¬ 
tioner’s  and  programmer’s  perspective  and  with  a  view  toward  re¬ 
placing  many  of  the  formalisms  commonly  used  in  other  texts  with 
constructs  more  readily  understandable  by  our  audience.  To  take  full 
advantage  of  the  programming  components  of  this  book,  a  knowledge 
of  basic  data  structures  and  object-oriented  programming,  ideally  in 
Java,  is  required.  We  selected  Java  for  a  number  of  reasons:  it  is 
the  first  programming  language  learned  by  students  in  a  wide  vari¬ 
ety  of  engineering  curricula,  and  professionals  with  knowledge  of  a 
related  language,  especially  C#  or  C++,  will  find  the  programming 
examples  easy  to  follow  and  extend. 
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The  software  in  this  book  is  designed  to  work  with  Image J, 
a  widely  used,  programmer-extensible,  imaging  system  developed, 
maintained,  and  distributed  by  the  National  Institutes  of  Health 
(NIH).1  Image  J  is  implemented  completely  in  Java,  and  therefore 
runs  on  all  major  platforms,  and  is  widely  used  because  its  “plugin”- 
based  architecture  enables  it  to  be  easily  extended.  While  all  exam¬ 
ples  run  in  ImageJ,  they  have  been  specifically  designed  to  be  easily 
ported  to  other  environments  and  programming  languages. 

Use  in  research  and  development 

This  book  has  been  especially  designed  for  use  as  a  textbook  and  as 
such  features  exercises  and  carefully  constructed  examples  that  sup¬ 
plement  our  detailed  presentation  of  the  fundamental  concepts  and 
techniques.  As  both  practitioners  and  developers,  we  know  that  the 
details  required  to  successfully  understand,  apply,  and  extend  classi¬ 
cal  techniques  are  often  difficult  to  find,  and  for  this  reason  we  have 
been  very  careful  to  provide  the  missing  details,  many  gleaned  over 
years  of  practical  application.  While  this  should  make  the  text  par¬ 
ticularly  valuable  to  those  in  research  and  development,  it  is  not  de¬ 
signed  as  a  comprehensive,  fully-cited  scientific  research  text.  On  the 
contrary,  we  have  carefully  vetted  our  citations  so  that  they  can  be 
obtained  from  easily  accessible  sources.  While  we  have  only  briefly 
discussed  the  fundamentals  of,  or  entirely  omitted,  topics  such  as 
hierarchical  methods,  wavelets,  or  eigenimages  because  of  space  lim¬ 
itations,  other  topics  have  been  left  out  deliberately,  including  ad¬ 
vanced  issues  such  as  object  recognition,  image  understanding,  and 
three-dimensional  (3D)  computer  vision.  So,  while  most  techniques 
described  in  this  book  could  be  called  “blind  and  dumb”,  it  is  our 
experience  that  straightforward,  technically  clean  implementations 
of  these  simpler  methods  are  essential  to  the  success  of  any  further 
domain-specific,  or  even  “intelligent”,  approaches. 

If  you  are  only  in  search  of  a  programming  handbook  for  Im¬ 
ageJ  or  Java,  there  are  certainly  better  sources.  While  the  book 
includes  many  code  examples,  programming  in  and  of  itself  is  not 
our  main  focus.  Instead  Java  serves  as  just  one  important  element 
for  describing  each  technique  in  a  precise  and  immediately  testable 
way. 

Classroom  use 

Whether  it  is  called  signal  processing,  image  processing,  or  media 
computation,  the  manipulation  of  digital  images  has  been  an  integral 
part  of  most  computer  science  and  engineering  curricula  for  many 
years.  Today,  with  the  omnipresence  of  all-digital  work  flows,  it  has 
become  an  integral  part  of  the  required  skill  set  for  professionals  in 
many  diverse  disciplines. 

Today  the  topic  has  migrated  into  the  early  stages  of  many  cur¬ 
ricula,  where  it  is  often  a  key  foundation  course.  This  migration 
uncovered  a  problem  in  that  many  of  the  texts  relied  on  as  standards 
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1  http://rsb.info.nih.gov/ij/. 


in  the  older  graduate-level  courses  were  not  appropriate  for  begin¬ 
ners.  The  texts  were  usually  too  formal  for  novices,  and  at  the  same 
time  did  not  provide  detailed  coverage  of  many  of  the  most  popular 
methods  used  in  actual  practice.  The  result  was  that  educators  had 
a  difficult  time  selecting  a  single  textbook  or  even  finding  a  compact 
collection  of  literature  to  recommend  to  their  students.  Faced  with 
this  dilemma  ourselves,  we  wrote  this  book  in  the  sincere  hope  of 
filling  this  gap. 

The  contents  of  the  following  chapters  can  be  presented  in  either 
a  one-  or  two-semester  sequence.  Where  feasible,  we  have  added 
supporting  material  in  order  to  make  each  chapter  as  independent 
as  possible,  providing  the  instructor  with  maximum  flexibility  when 
designing  the  course.  Chapters  18-20  offer  a  complete  introduction  to 
the  fundamental  spectral  techniques  used  in  image  processing  and  are 
essentially  independent  of  the  other  material  in  the  text.  Depending 
on  the  goals  of  the  instructor  and  the  curriculum,  they  can  be  covered 
in  as  much  detail  as  required  or  completely  omitted.  The  following 
road  map  shows  a  possible  partitioning  of  topics  for  a  two-semester 
syllabus. 


Road  Map  for  a  1/2-Semester  Syllabus  Sem.  1  2 

1.  Digital  Images  .  ■  □ 

2.  Image J .  ■  □ 

3.  Histograms  and  Image  Statistics  .  ■  □ 

4.  Point  Operations .  ■  □ 

5.  Filters  .  ■  □ 

6.  Edges  and  Contours .  ■  □ 

7.  Corner  Detection .  □  ■ 

8.  The  Hough  Transform:  Finding  Simple  Curves  .  □  ■ 

9.  Morphological  Filters  .  ■  □ 

10.  Regions  in  Binary  Images  .  ■  □ 

11.  Automatic  Thresholding  .  □  ■ 

12.  Color  Images .  ■  □ 

13.  Color  Quantization .  □  ■ 

14.  Colorimetric  Color  Spaces  .  □  ■ 

15.  Filters  for  Color  Images  .  □  ■ 

16.  Edge  Detection  in  Color  Images  .  □  ■ 

17.  Edge-Preserving  Smoothing  Filters  .  □  ■ 

18.  Introduction  to  Spectral  Techniques  .  □  ■ 

19.  The  Discrete  Fourier  Transform  in  2D  .  □  ■ 

20.  The  Discrete  Cosine  Transform  (DCT)  .  □  ■ 

21.  Geometric  Operations  .  ■  □ 

22.  Pixel  Interpolation  .  ■  □ 

23.  Image  Matching  and  Registration  .  ■  □ 

24.  Non-Rigid  Image  Matching .  □  ■ 

25.  Scale-Invariant  Local  Features  (SIFT)  .  □  ■ 

26.  Fourier  Shape  Descriptors  .  □  ■ 


Addendum  to  the  second  edition 

This  second  edition  is  based  on  our  completely  revised  German  third 
edition  and  contains  both  additional  material  and  several  new  chap- 
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ters  including:  automatic  thresholding  (Ch.  11),  filters  and  edge  de¬ 
tection  for  color  images  (Chs.  15  and  16),  edge-preserving  smoothing 
filters  (Ch.  17),  non-rigid  image  matching  (Ch.  24),  and  Fourier  shape 
descriptors  (Ch.  26).  Much  of  this  new  material  is  presented  for  the 
first  time  at  the  level  of  detail  necessary  to  completely  understand 
and  create  a  working  implementation. 

The  two  final  chapters  on  SIFT  and  Fourier  shape  descriptors 
are  particularly  detailed  to  demonstrate  the  actual  level  of  granu¬ 
larity  and  the  special  cases  which  must  be  considered  when  actually 
implementing  complex  techniques.  Some  other  chapters  have  been 
rearranged  or  split  into  multiple  parts  for  more  clarity  and  easier  use 
in  teaching.  The  mathematical  notation  and  programming  examples 
were  completely  revised  and  almost  all  illustrations  were  adapted  or 
created  anew  for  this  full-color  edition. 

For  this  edition,  the  ImageJ  Short  Reference  and  ancillary  source 
code  have  been  relocated  from  the  Appendix  and  the  most  re¬ 
cently  versions  are  freely  available  in  electronic  form  from  the  book’s 
website.  The  complete  source  code,  consisting  of  the  common 
imagingbook  library,  sample  ImageJ  plugins  for  each  book  chapter, 
and  extended  documentation  are  available  from  the  book’s  Source- 
Forge  site.2 

Online  resources 

Visit  the  website  for  this  book 

www.imagingbook.com 

to  download  supplementary  materials,  including  the  complete  Java 
source  code  for  all  examples  and  the  underlying  software  library,  full- 
size  test  images,  useful  references,  and  other  supplements.  Com¬ 
ments,  questions,  and  corrections  are  welcome  and  may  be  ad¬ 
dressed  to 

imagingbook@gmail.com 
Exercises  and  solutions 

Each  chapter  of  this  book  contains  a  set  of  sample  exercises,  mainly 
for  supporting  instructors  to  prepare  their  own  assignments.  Most  of 
these  tasks  are  easy  to  solve  after  studying  the  corresponding  chapter, 
while  some  others  may  require  more  elaborated  reasoning  or  experi¬ 
mental  work.  We  assume  that  scholars  know  best  how  to  select  and 
adapt  individual  assignments  in  order  to  fit  the  level  and  interest  of 
their  students.  This  is  the  main  reason  why  we  have  abstained  from 
publishing  explicit  solutions  in  the  past.  However,  we  are  happy  to 
answer  any  personal  request  if  an  exercise  is  unclear  or  seems  to  elude 
a  simple  solution. 

Thank  you! 

This  book  would  not  have  been  possible  without  the  understanding 
and  support  of  our  families.  Our  thanks  go  to  Wayne  Rasband  at 
NIH  for  developing  ImageJ  and  for  his  truly  outstanding  support  of 

2  http://sourceforge.net/projects/imagingbook/. 
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the  community  and  to  all  our  readers  of  the  previous  editions  who 
provided  valuable  input,  suggestions  for  improvement,  and  encour¬ 
agement.  The  use  of  open  source  software  for  such  a  project  always 
carries  an  element  of  risk,  since  the  long-term  acceptance  and  conti¬ 
nuity  is  difficult  to  assess.  Retrospectively,  choosing  ImageJ  as  the 
software  basis  for  this  work  was  a  good  decision,  and  we  would  con¬ 
sider  ourselves  happy  if  our  book  has  indirectly  contributed  to  the 
success  of  the  ImageJ  project  itself.  Finally,  we  owe  a  debt  of  grati¬ 
tude  to  the  professionals  at  Springer,  particularly  to  Wayne  Wheeler 
and  Simon  Reeves  who  were  responsible  for  the  English  edition. 

Hagenberg  /  Washington  D.C. 

Fall  2015 
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Digital  Images 


For  a  long  time,  using  a  computer  to  manipulate  a  digital  image  (i.e., 
digital  image  processing)  was  something  performed  by  only  a  rela¬ 
tively  small  group  of  specialists  who  had  access  to  expensive  equip¬ 
ment.  Usually  this  combination  of  specialists  and  equipment  was 
only  to  be  found  in  research  labs,  and  so  the  field  of  digital  image 
processing  has  its  roots  in  the  academic  realm.  Now,  however,  the 
combination  of  a  powerful  computer  on  every  desktop  and  the  fact 
that  nearly  everyone  has  some  type  of  device  for  digital  image  ac¬ 
quisition,  be  it  their  cell  phone  camera,  digital  camera,  or  scanner, 
has  resulted  in  a  plethora  of  digital  images  and,  with  that,  for  many 
digital  image  processing  has  become  as  common  as  word  processing. 
It  was  not  that  many  years  ago  that  digitizing  a  photo  and  saving  it 
to  a  file  on  a  computer  was  a  time-consuming  task.  This  is  perhaps 
difficult  to  imagine  given  today’s  powerful  hardware  and  operating 
system  level  support  for  all  types  of  digital  media,  but  it  is  always 
sobering  to  remember  that  “personal”  computers  in  the  early  1990s 
were  not  powerful  enough  to  even  load  into  main  memory  a  single 
image  from  a  typical  digital  camera  of  today.  Now  powerful  hard¬ 
ware  and  software  packages  have  made  it  possible  for  amateurs  to 
manipulate  digital  images  and  videos  just  as  easily  as  professionals. 

All  of  these  developments  have  resulted  in  a  large  community 
that  works  productively  with  digital  images  while  having  only  a  basic 
understanding  of  the  underlying  mechanics.  For  the  typical  consumer 
merely  wanting  to  create  a  digital  archive  of  vacation  photos,  a  deeper 
understanding  is  not  required,  just  as  a  deep  understanding  of  the 
combustion  engine  is  unnecessary  to  successfully  drive  a  car. 

Today,  IT  professionals  must  be  more  then  simply  familiar  with 
digital  image  processing.  They  are  expected  to  be  able  to  knowledge¬ 
ably  manipulate  images  and  related  digital  media,  which  are  an  in¬ 
creasingly  important  part  of  the  workflow  not  only  of  those  involved 
in  medicine  and  media  but  all  industries.  In  the  same  way,  soft¬ 
ware  engineers  and  computer  scientists  are  increasingly  confronted 
with  developing  programs,  databases,  and  related  systems  that  must 
correctly  deal  with  digital  images.  The  simple  lack  of  practical  ex- 
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1  Digital  Images  perience  with  this  type  of  material,  combined  with  an  often  unclear 

understanding  of  its  basic  foundations  and  a  tendency  to  underes¬ 
timate  its  difficulties,  frequently  leads  to  inefficient  solutions,  costly 
errors,  and  personal  frustration. 


1.1  Programming  with  Images 

Even  though  the  term  “image  processing”  is  often  used  interchange¬ 
ably  with  that  of  “image  editing”,  we  introduce  the  following  more 
precise  definitions.  Digital  image  editing,  or  as  it  is  sometimes  re¬ 
ferred  to,  digital  imaging,  is  the  manipulation  of  digital  images  using 
an  existing  software  application  such  as  Adobe  Photoshop®  or  Corel 
Paint®.  Digital  image  processing,  on  the  other  hand,  is  the  con¬ 
ception,  design,  development,  and  enhancement  of  digital  imaging 
programs. 

Modern  programming  environments,  with  their  extensive  APIs 
(application  programming  interfaces),  make  practically  every  aspect 
of  computing,  be  it  networking,  databases,  graphics,  sound,  or  imag¬ 
ing,  easily  available  to  nonspecialists.  The  possibility  of  developing 
a  program  that  can  reach  into  an  image  and  manipulate  the  individ¬ 
ual  elements  at  its  very  core  is  fascinating  and  seductive.  You  will 
discover  that  with  the  right  knowledge,  an  image  becomes  ultimately 
no  more  than  a  simple  array  of  values,  that  with  the  right  tools  you 
can  manipulate  in  any  way  imaginable. 

“Computer  graphics”,  in  contrast  to  digital  image  processing, 
concentrates  on  the  synthesis  of  digital  images  from  geometrical  de¬ 
scriptions  such  as  three-dimensional  (3D)  object  models  [75,87,247]. 
While  graphics  professionals  today  tend  to  be  interested  in  topics 
such  as  realism  and,  especially  in  terms  of  computer  games,  render¬ 
ing  speed,  the  field  does  draw  on  a  number  of  methods  that  originate 
in  image  processing,  such  as  image  transformation  (morphing),  recon¬ 
struction  of  3D  models  from  image  data,  and  specialized  techniques 
such  as  image-based  and  nonphotorealistic  rendering  [180,248].  Sim¬ 
ilarly,  image  processing  makes  use  of  a  number  of  ideas  that  have 
their  origin  in  computational  geometry  and  computer  graphics,  such 
as  volumetric  (voxel)  models  in  medical  image  processing.  The  two 
fields  perhaps  work  closest  when  it  comes  to  digital  postproduction 
of  film  and  video  and  the  creation  of  special  effects  [256].  This  book 
provides  a  thorough  grounding  in  the  effective  processing  of  not  only 
images  but  also  sequences  of  images;  that  is,  videos. 


1.2  Image  Analysis  and  Computer  Vision 

Often  it  appears  at  first  glance  that  a  given  image-processing  task  will 
have  a  simple  solution,  especially  when  it  is  something  that  is  easily 
accomplished  by  our  own  visual  system.  Yet  in  practice  it  turns  out 
that  developing  reliable,  robust,  and  timely  solutions  is  difficult  or 
simply  impossible.  This  is  especially  true  when  the  problem  involves 
image  analysis ;  that  is,  where  the  ultimate  goal  is  not  to  enhance 
or  otherwise  alter  the  appearance  of  an  image  but  instead  to  extract 
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meaningful  information  about  its  contents — be  it  distinguishing  an  ^  2  Image  Analysis  and 
object  from  its  background,  following  a  street  on  a  map,  or  finding  Computer  Vision 
the  bar  code  on  a  milk  carton,  tasks  such  as  these  often  turn  out  to 
be  much  more  difficult  to  accomplish  than  we  would  expect. 

We  expect  technology  to  improve  on  what  we  can  do  by  ourselves. 

Be  it  as  simple  as  a  lever  to  lift  more  weight  or  binoculars  to  see 
farther  or  as  complex  as  an  airplane  to  move  us  across  continents — 
science  has  created  so  much  that  improves  on,  sometimes  by  unbe¬ 
lievable  factors,  what  our  biological  systems  are  able  to  perform.  So, 
it  is  perhaps  humbling  to  discover  that  today’s  technology  is  nowhere 
near  as  capable,  when  it  comes  to  image  analysis,  as  our  own  visual 
system.  While  it  is  possible  that  this  will  always  remain  true,  do  not 
let  this  discourage  you.  Instead  consider  it  a  challenge  to  develop  cre¬ 
ative  solutions.  Using  the  tools,  techniques,  and  fundamental  knowl¬ 
edge  available  today,  it  is  possible  not  only  to  solve  many  problems 
but  to  create  robust,  reliable,  and  fast  applications. 

While  image  analysis  is  not  the  main  subject  of  this  book,  it  of¬ 
ten  naturally  intersects  with  image  processing  and  we  will  explore 
this  intersection  in  detail  in  these  situations:  finding  simple  curves 
(Ch.  8),  segmenting  image  regions  (Ch.  10),  and  comparing  images 
(Ch.  23).  In  these  cases,  we  present  solutions  that  work  directly 
on  the  pixel  data  in  a  bottom-up  way  without  recourse  to  domain- 
specific  knowledge  (i.e. ,  blind  solutions).  In  this  way,  our  solutions 
essentially  embody  the  distinction  between  image  processing,  pattern 
recognition ,  and  computer  vision ,  respectively.  While  these  two  disci¬ 
plines  are  firmly  grounded  in,  and  rely  heavily  on,  image  processing, 
their  ultimate  goals  are  much  more  lofty. 

Pattern  recognition  is  primarily  a  mathematical  discipline  and  has 
been  responsible  for  techniques  such  as  clustering,  hidden  Markov 
models  (HMMs),  decision  trees,  and  principal  component  analysis 
(PC A),  which  are  used  to  discover  patterns  in  data  and  signals. 

Methods  from  pattern  recognition  have  been  applied  extensively  to 
problems  arising  in  computer  vision  and  image  analysis.  A  good  ex¬ 
ample  of  their  successful  application  is  optical  character  recognition 
(OCR),  where  robust,  highly  accurate  turnkey  solutions  are  available 
for  recognizing  scanned  text.  Pattern  recognition  methods  are  truly 
universal  and  have  been  successfully  applied  not  only  to  images  but 
also  speech  and  audio  signals,  text  documents,  stock  trades,  and  find¬ 
ing  trends  in  large  databases,  where  it  is  often  called  data  mining. 

Dimensionality  reduction,  statistical,  and  syntactical  methods  play 
important  roles  in  pattern  recognition  (see,  e.g.,  [64,169,228]). 

Computer  vision  tackles  the  problem  of  engineering  artificial  visual 
systems  capable  of  somehow  comprehending  and  interpreting  our 
real,  3D  world.  Popular  topics  in  this  field  include  scene  under¬ 
standing,  object  recognition,  motion  interpretation  (tracking),  au¬ 
tonomous  navigation,  and  the  robotic  manipulation  of  objects  in  a 
scene.  Since  computer  vision  has  its  roots  in  artificial  intelligence 
(AI),  many  AI  methods  were  originally  developed  to  either  tackle  or 
represent  a  problem  in  computer  vision  (see,  e.g.,  [51,  Ch.  13]).  The 
fields  still  have  much  in  common  today,  especially  in  terms  of  adap- 
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1  Digital  Images  tive  methods  and  machine  learning.  Further  literature  on  computer 

vision  includes  [15,78,110,214,222,232]. 

Ultimately  you  will  find  image  processing  to  be  both  intellectually 
challenging  and  professionally  rewarding,  as  the  field  is  ripe  with 
problems  that  were  originally  thought  to  be  relatively  simple  to  solve 
but  have  to  this  day  refused  to  give  up  their  secrets.  With  the  back¬ 
ground  and  techniques  presented  in  this  text,  you  will  not  only  be 
able  to  develop  complete  image-processing  solutions  but  will  also  have 
the  prerequisite  knowledge  to  tackle  unsolved  problems  and  the  real 
possibility  of  expanding  the  horizons  of  science:  for  while  image  pro¬ 
cessing  by  itself  may  not  change  the  world,  it  is  likely  to  be  the 
foundation  that  supports  marvels  of  the  future. 


1 .3  Types  of  Digital  Images 

Digital  images  are  the  central  theme  of  this  book,  and  unlike  just 
a  few  years  ago,  this  term  is  now  so  commonly  used  that  there  is 
really  no  reason  to  explain  it  further.  Yet  this  book  is  not  about  all 
types  of  digital  images,  instead  it  focuses  on  images  that  are  made 
up  of  picture  elements ,  more  commonly  known  as  pixels ,  arranged  in 
a  regular  rectangular  grid. 

Every  day,  people  work  with  a  large  variety  of  digital  raster  images 
such  as  color  photographs  of  people  and  landscapes,  grayscale  scans 
of  printed  documents,  building  plans,  faxed  documents,  screenshots, 
medical  images  such  as  x-rays  and  ultrasounds,  and  a  multitude  of 
others  (see  Fig.  1.1  for  examples).  Despite  all  the  different  sources 
for  these  images,  they  are  all,  as  a  rule,  ultimately  represented  as 
rectangular  ordered  arrays  of  image  elements. 


1.4  Image  Acquisition 

The  process  by  which  a  scene  becomes  a  digital  image  is  varied  and 
complicated,  and,  in  most  cases,  the  images  you  work  with  will  al¬ 
ready  be  in  digital  form,  so  we  only  outline  here  the  essential  stages  in 
the  process.  As  most  image  acquisition  methods  are  essentially  vari¬ 
ations  on  the  classical  optical  camera,  we  will  begin  by  examining  it 
in  more  detail. 


1.4.1  The  Pinhole  Camera  Model 

The  pinhole  camera  is  one  of  the  simplest  camera  models  and  has 
been  in  use  since  the  13th  century,  when  it  was  known  as  the  “Camera 
Obscura”.  While  pinhole  cameras  have  no  practical  use  today  except 
to  hobbyists,  they  are  a  useful  model  for  understanding  the  essential 
optical  components  of  a  simple  camera.  The  pinhole  camera  consists 
of  a  closed  box  with  a  small  opening  on  the  front  side  through  which 
light  enters,  forming  an  image  on  the  opposing  wall.  The  light  forms 
a  smaller,  inverted  image  of  the  scene  (Fig.  1.2). 
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1.4  Image  Acquisition 

Fig.  1.1 

Examples  of  digital  images. 
Natural  landscape  (a),  syn¬ 
thetically  generated  scene  (b), 
poster  graphic  (c),  computer 
screenshot  (d),  black  and  white 
illustration  (e),  barcode  (f), 
fingerprint  (g),  x-ray  (h),  mi¬ 
croscope  slide  (i),  satellite 
image  (j),  radar  image  (k), 
astronomical  object  (1). 


Perspective  projection 

The  geometric  properties  of  the  pinhole  camera  are  very  simple.  The 
optical  axis  runs  through  the  pinhole  perpendicular  to  the  image 
plane.  We  assume  a  visible  object,  in  our  illustration  the  cactus, 
located  at  a  horizontal  distance  Z  from  the  pinhole  and  vertical  dis¬ 
tance  Y  from  the  optical  axis.  The  height  of  the  projection  y  is 
determined  by  two  parameters:  the  fixed  depth  of  the  camera  box  / 
and  the  distance  Z  to  the  object  from  the  origin  of  the  coordinate 
system.  By  comparison  we  see  that 


x 


A 


and  y  =  — /•  — 


(1.1) 
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Fig.  1.2 

Geometry  of  the  pinhole  cam¬ 
era.  The  pinhole  opening 
serves  as  the  origin  ( O )  of  the 
3D  coordinate  system  (X,  Y ,  Z) 
for  the  objects  in  the  scene. 
The  optical  axis,  which  runs 
through  the  opening,  is  the  Z 
axis  of  this  coordinate  system. 
A  separate  2D  coordinate  sys¬ 
tem  (x,y)  describes  the  projec¬ 
tion  points  on  the  image  plane. 
The  distance  /  (“focal  length”) 
between  the  opening  and 
the  image  plane  determines 
the  scale  of  the  projection. 


change  with  the  scale  of  the  resulting  image  in  proportion  to  the 
depth  of  the  box  (i.e.,  the  distance  /)  in  a  way  similar  to  how  the 
focal  length  does  in  an  everyday  camera.  For  a  fixed  image,  a  small  / 
(i.e.,  short  focal  length)  results  in  a  small  image  and  a  large  viewing 
angle,  just  as  occurs  when  a  wide-angle  lens  is  used,  while  increasing 
the  “focal  length”  /  results  in  a  larger  image  and  a  smaller  viewing 
angle,  just  as  occurs  when  a  telephoto  lens  is  used.  The  negative 
sign  in  Eqn.  (1.1)  means  that  the  projected  image  is  flipped  in  the 
horizontal  and  vertical  directions  and  rotated  by  180°. 

Equation  (1.1)  describes  what  is  commonly  known  today  as  the 
perspective  transformation }  Important  properties  of  this  theoretical 
model  are  that  straight  lines  in  3D  space  always  appear  straight  in 
2D  projections  and  that  circles  appear  as  ellipses. 


1.4.2  The  “Thin”  Lens 

While  the  simple  geometry  of  the  pinhole  camera  makes  it  useful  for 
understanding  its  basic  principles,  it  is  never  really  used  in  practice. 
One  of  the  problems  with  the  pinhole  camera  is  that  it  requires  a 
very  small  opening  to  produce  a  sharp  image.  This  in  turn  reduces 
the  amount  of  light  passed  through  and  thus  leads  to  extremely  long 
exposure  times.  In  reality,  glass  lenses  or  systems  of  optical  lenses  are 
used  whose  optical  properties  are  greatly  superior  in  many  aspects 
but  of  course  are  also  much  more  complex.  Instead  we  can  make  our 
model  more  realistic,  without  unduly  increasing  its  complexity,  by 
replacing  the  pinhole  with  a  “thin  lens”  as  in  Fig.  1.3. 

In  this  model,  the  lens  is  assumed  to  be  symmetric  and  infinitely 
thin,  such  that  all  light  rays  passing  through  it  cross  through  a  virtual 
plane  in  the  middle  of  the  lens.  The  resulting  image  geometry  is  the 
same  as  that  of  the  pinhole  camera.  This  model  is  not  sufficiently 
complex  to  encompass  the  physical  details  of  actual  lens  systems,  such 

1  It  is  hard  to  imagine  today  that  the  rules  of  perspective  geometry,  while 
known  to  the  ancient  mathematicians,  were  only  rediscovered  in  1430 
by  the  Renaissance  painter  Brunelleschi. 
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Fig.  1.3 

Thin  lens  projection  model. 


as  geometrical  distortions  and  the  distinct  refraction  properties  of 
different  colors.  So,  while  this  simple  model  suffices  for  our  purposes 
(i.e.,  understanding  the  mechanics  of  image  acquisition),  much  more 
detailed  models  that  incorporate  these  additional  complexities  can 
be  found  in  the  literature  (see,  e.g.,  [126]). 

1.4.3  Going  Digital 

What  is  projected  on  the  image  plane  of  our  camera  is  essentially 
a  two-dimensional  (2D),  time-dependent,  continuous  distribution  of 
light  energy.  In  order  to  convert  this  image  into  a  digital  image  on 
our  computer,  the  following  three  main  steps  are  necessary: 

1.  The  continuous  light  distribution  must  be  spatially  sampled. 

2.  This  resulting  function  must  then  be  sampled  in  time  to  create  a 
single  (still)  image. 

3.  Finally,  the  resulting  values  must  be  quantized  to  a  finite  range 
of  integers  (or  floating-point  values)  such  that  they  can  be  rep¬ 
resented  by  digital  numbers. 

Step  1:  Spatial  sampling 

The  spatial  sampling  of  an  image  (i.e.,  the  conversion  of  the  contin¬ 
uous  signal  to  its  discrete  representation)  depends  on  the  geometry 
of  the  sensor  elements  of  the  acquisition  device  (e.g.,  a  digital  or 
video  camera).  The  individual  sensor  elements  are  arranged  in  or¬ 
dered  rows,  almost  always  at  right  angles  to  each  other,  along  the 
sensor  plane  (Fig.  1.4).  Other  types  of  image  sensors,  which  include 
hexagonal  elements  and  circular  sensor  structures,  can  be  found  in 
specialized  products. 

Step  2:  Temporal  sampling 

Temporal  sampling  is  carried  out  by  measuring  at  regular  intervals 
the  amount  of  light  incident  on  each  individual  sensor  element.  The 
CCD2  in  a  digital  camera  does  this  by  triggering  the  charging  process 
and  then  measuring  the  amount  of  electrical  charge  that  has  built  up 
during  the  specified  amount  of  time  that  the  CCD  was  illuminated. 

2  Charge-coupled  device. 
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Fig.  1.4 

The  geometry  of  the  sensor 
elements  is  directly  responsi¬ 
ble  for  the  spatial  sampling 
of  the  continuous  image.  In 
the  simplest  case,  a  plane  of 
sensor  elements  are  arranged 
in  an  evenly  spaced  grid,  and 
each  element  measures  the 
amount  of  light  that  falls  on  it. 


incident  light 


\  1  /  / 
\  »  /  / 


Step  3:  Quantization  of  pixel  values 

In  order  to  store  and  process  the  image  values  on  the  computer 
they  are  commonly  converted  to  an  integer  scale  (e.g.,  256  =  28  or 
4096  =  212).  Occasionally  floating-point  values  are  used  in  profes¬ 
sional  applications,  such  as  medical  imaging.  Conversion  is  carried 
out  using  an  analog  to  digital  converter,  which  is  typically  embedded 
directly  in  the  sensor  electronics  so  that  conversion  occurs  at  image 
capture  or  is  performed  by  special  interface  hardware. 

Images  as  discrete  functions 

The  result  of  these  three  stages  is  a  description  of  the  image  in  the 
form  of  a  2D,  ordered  matrix  of  integers  (Fig.  1.5).  Stated  a  bit 
more  formally,  a  digital  image  I  is  a  2D  function  that  maps  from 
the  domain  of  integer  coordinates  N  x  N  to  a  range  of  possible  pixel 
values  P  such  that 


I(u,v)  £  P  and  u,  v  E  N. 

Now  we  are  ready  to  transfer  the  image  to  our  computer  so  that  we 
can  save,  compress,  and  otherwise  manipulate  it  into  the  file  format 
of  our  choice.  At  this  point,  it  is  no  longer  important  to  us  how  the 
image  originated  since  it  is  now  a  simple  2D  array  of  numerical  data. 
Before  moving  on,  we  need  a  few  more  important  definitions. 

1.4.4  Image  Size  and  Resolution 

In  the  following,  we  assume  rectangular  images,  and  while  that  is  a 
relatively  safe  assumption,  exceptions  do  exist.  The  size  of  an  image 
is  determined  directly  from  the  width  M  (number  of  columns)  and 
the  height  N  (number  of  rows)  of  the  image  matrix  I. 

The  resolution  of  an  image  specifies  the  spatial  dimensions  of 
the  image  in  the  real  world  and  is  given  as  the  number  of  image 
elements  per  measurement;  for  example,  dots  per  inch  (dpi)  or  lines 
per  inch  (lpi)  for  print  production,  or  in  pixels  per  kilometer  for 
satellite  images.  In  most  cases,  the  resolution  of  an  image  is  the 
same  in  the  horizontal  and  vertical  directions,  which  means  that  the 
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image  elements  are  square.  Note  that  this  is  not  always  the  case  as, 
for  example,  the  image  sensors  of  most  current  video  cameras  have 
non-square  pixels! 

The  spatial  resolution  of  an  image  may  not  be  relevant  in  many 
basic  image  processing  steps,  such  as  point  operations  or  filters.  Pre¬ 
cise  resolution  information  is,  however,  important  in  cases  where  ge¬ 
ometrical  elements  such  as  circles  need  to  be  drawn  on  an  image 
or  when  distances  within  an  image  need  to  be  measured.  For  these 
reasons,  most  image  formats  and  software  systems  designed  for  pro¬ 
fessional  applications  rely  on  precise  information  about  image  reso¬ 
lution. 


1.4.5  Image  Coordinate  System 

In  order  to  know  which  position  on  the  image  corresponds  to  which 
image  element,  we  need  to  impose  a  coordinate  system.  Contrary 
to  normal  mathematical  conventions,  in  image  processing  the  coor¬ 
dinate  system  is  usually  flipped  in  the  vertical  direction;  that  is,  the 
//-coordinate  runs  from  top  to  bottom  and  the  origin  lies  in  the  upper 
left  corner  (Fig.  1.6).  While  this  system  has  no  practical  or  theoret¬ 
ical  advantage,  and  in  fact  may  be  a  bit  confusing  in  the  context  of 
geometrical  transformations,  it  is  used  almost  without  exception  in 
imaging  software  systems.  The  system  supposedly  has  its  roots  in 
the  original  design  of  television  broadcast  systems,  where  the  picture 
rows  are  numbered  along  the  vertical  deflection  of  the  electron  beam, 
which  moves  from  the  top  to  the  bottom  of  the  screen.  We  start  the 
numbering  of  rows  and  columns  at  zero  for  practical  reasons,  since 
in  Java  array  indexing  also  begins  at  zero. 


1.4.6  Pixel  Values 

The  information  within  an  image  element  depends  on  the  data  type 
used  to  represent  it.  Pixel  values  are  practically  always  binary  words 
of  length  k  so  that  a  pixel  can  represent  any  of  2k  different  values. 
The  value  k  is  called  the  bit  depth  (or  just  “depth”)  of  the  image.  The 
exact  bit-level  layout  of  an  individual  pixel  depends  on  the  kind  of 


1.4  Image  Acquisition 

Fig.  1.5 

The  transformation  of  a 
continuous  grayscale  image 
F(x,  y )  to  a  discrete  digital  im¬ 
age  I(u,v )  (left),  image  detail 
(below). 
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Fig.  1.6 

Image  coordinates.  In  digital 
image  processing,  it  is  com¬ 
mon  to  use  a  coordinate  sys¬ 
tem  where  the  origin  (u  =  0, 
v  =  0)  lies  in  the  upper  left 
corner.  The  coordinates  u,  v 
represent  the  columns  and  the 
rows  of  the  image,  respectively. 
For  an  image  with  dimensions 
M  X  N,  the  maximum  col¬ 
umn  number  is  umax  —  M  —  l 
and  the  maximum  row  num¬ 
ber  is  umax  =  N  —  1. 
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Table  1.1 

Bit  depths  of  common 
image  types  and  typi¬ 
cal  application  domains. 


Grayscale  (Intensity  Images): 


Chan. 

Bits/Pix. 

Range 

Use 

1 

1 

[0,1] 

Binary  image:  document,  illustration,  fax 

1 

8 

[0,  255] 

Universal:  photo,  scan,  print 

1 

12 

[0,  4095] 

High  quality:  photo,  scan,  print 

1 

14 

[0,  16383] 

Professional:  photo,  scan,  print 

1 

16 

[0,  65535] 

Highest  quality:  medicine,  astronomy 

Color  Images: 


Chan. 

Bits/Pix. 

Range 

Use 

3 

24 

[0,  255]3 

RGB,  universal:  photo,  scan,  print 

3 

36 

[0,  4095]3 

RGB,  high  quality:  photo,  scan,  print 

3 

42 

[0,  16383]3 

RGB,  professional:  photo,  scan,  print 

4 

32 

[0,  255]4 

CMYK,  digital  prepress 

Special  Images: 


Chan. 

Bits/Pix. 

Range 

Use 

1 

16 

[-32768,32767] 

Integer  values  pos./neg.,  increased  range 

1 

32 

±3.4  •  1038 

Floating-point  values:  medicine,  astronomy 

1 

64 

±1.8  •  10308 

Floating-point  values:  internal  processing 

image;  for  example,  binary,  grayscale,  or  RGB3  color.  The  properties 
of  some  common  image  types  are  summarized  below  (also  see  Table 
1.1). 


Grayscale  images  (intensity  images) 

The  image  data  in  a  grayscale  image  consist  of  a  single  channel  that 
represents  the  intensity,  brightness,  or  density  of  the  image.  In  most 
cases,  only  positive  values  make  sense,  as  the  numbers  represent  the 
intensity  of  light  energy  or  density  of  film  and  thus  cannot  be  neg¬ 
ative,  so  typically  whole  integers  in  the  range  0, . . . ,  2k  —  1  are  used. 
For  example,  a  typical  grayscale  image  uses  k  =  8  bits  (1  byte)  per 
pixel  and  intensity  values  in  the  range  0, . . . ,  255,  where  the  value 
0  represents  the  minimum  brightness  (black)  and  255  the  maximum 
brightness  (white). 

For  many  professional  photography  and  print  applications,  as  well 
as  in  medicine  and  astronomy,  8  bits  per  pixel  is  not  sufficient.  Image 
depths  of  12,  14,  and  even  16  bits  are  often  encountered  in  these 


Red,  green,  and  blue. 
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domains.  Note  that  bit  depth  usually  refers  to  the  number  of  bits  ^  5  iMAGE  pILE  Formats 

used  to  represent  one  color  component,  not  the  number  of  bits  needed 

to  represent  an  entire  color  pixel.  For  example,  an  RGB-encoded 

color  image  with  an  8-bit  depth  would  require  8  bits  for  each  channel 

for  a  total  of  24  bits,  while  the  same  image  with  a  12-bit  depth  would 

require  a  total  of  36  bits. 

Binary  images 

Binary  images  are  a  special  type  of  intensity  image  where  pixels  can 
only  take  on  one  of  two  values,  black  or  white.  These  values  are 
typically  encoded  using  a  single  bit  (0/1)  per  pixel.  Binary  images 
are  often  used  for  representing  line  graphics,  archiving  documents, 
encoding  fax  transmissions,  and  of  course  in  electronic  printing. 

Color  images 

Most  color  images  are  based  on  the  primary  colors  red,  green,  and 
blue  (RGB),  typically  making  use  of  8  bits  for  each  color  component. 

In  these  color  images,  each  pixel  requires  3  x  8  =  24  bits  to  encode  all 
three  components,  and  the  range  of  each  individual  color  component 
is  [0,  255].  As  with  intensity  images,  color  images  with  30,  36,  and  42 
bits  per  pixel  are  commonly  used  in  professional  applications.  Finally, 
while  most  color  images  contain  three  components,  images  with  four 
or  more  color  components  are  common  in  most  prepress  applications, 
typically  based  on  the  subtractive  CMYK  (Cyan-Magenta- Yellow- 
Black)  color  model  (see  Ch.  12). 

Indexed  or  palette  images  constitute  a  very  special  class  of  color 
image.  The  difference  between  an  indexed  image  and  a  true  color 
image  is  the  number  of  different  colors  (fewer  for  an  indexed  image) 
that  can  be  used  in  a  particular  image.  In  an  indexed  image,  the  pixel 
values  are  only  indices  (with  a  maximum  of  8  bits)  onto  a  specific 
table  of  selected  full-color  values  (see  Sec.  12.1.1). 

Special  images 

Special  images  are  required  if  none  of  the  above  standard  formats 
is  sufficient  for  representing  the  image  values.  Two  common  exam¬ 
ples  of  special  images  are  those  with  negative  values  and  those  with 
floating-point  values.  Images  with  negative  values  arise  during  image- 
processing  steps,  such  as  filtering  for  edge  detection  (see  Sec.  6.2.2), 
and  images  with  floating-point  values  are  often  found  in  medical, 
biological,  or  astronomical  applications,  where  extended  numerical 
range  and  precision  are  required.  These  special  formats  are  mostly 
application  specific  and  thus  may  be  difficult  to  use  with  standard 
image-processing  tools. 


1.5  Image  File  Formats 

While  in  this  book  we  almost  always  consider  image  data  as  be¬ 
ing  already  in  the  form  of  a  2D  array — ready  to  be  accessed  by  a 
program — ,  in  practice  image  data  must  first  be  loaded  into  mem¬ 
ory  from  a  file.  Files  provide  the  essential  mechanism  for  storing, 
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1  Digital  Images  archiving,  and  exchanging  image  data,  and  the  choice  of  the  correct 

file  format  is  an  important  decision.  In  the  early  days  of  digital  im¬ 
age  processing  (i.e.,  before  around  1985),  most  software  developers 
created  a  new  custom  hie  format  for  almost  every  new  application 
they  developed.4  Today  there  exist  a  wide  range  of  standardized  hie 
formats,  and  developers  can  almost  always  hnd  at  least  one  existing 
format  that  is  suitable  for  their  application.  Using  standardized  hie 
formats  vastly  increases  the  ease  with  which  images  can  be  exchanged 
and  the  likelihood  that  the  images  will  be  readable  by  other  software 
in  the  long  term.  Yet  for  many  projects  the  selection  of  the  right  hie 
format  is  not  always  simple,  and  compromises  must  be  made.  The 
following  sub-sections  outline  a  few  of  the  typical  criteria  that  need 
to  be  considered  when  selecting  an  appropriate  hie  format. 

1.5.1  Raster  versus  Vector  Data 

In  the  following,  we  will  deal  exclusively  with  hie  formats  for  storing 
raster  images ;  that  is,  images  that  contain  pixel  values  arranged  in  a 
regular  matrix  using  discrete  coordinates.  In  contrast,  vector  graphics 
represent  geometric  objects  using  continuous  coordinates,  which  are 
only  rasterized  once  they  need  to  be  displayed  on  a  physical  device 
such  as  a  monitor  or  printer. 

A  number  of  standardized  hie  formats  exist  for  vector  images, 
such  as  the  ANSI/ISO  standard  format  CGM  (Computer  Graph¬ 
ics  Metahle)  and  SVG  (Scalable  Vector  Graphics),5  as  well  as  pro¬ 
prietary  formats  such  as  DXF  (Drawing  Exchange  Format  from 
AutoDesk),  AI  (Adobe  Illustrator),  PICT  (QuickDraw  Graphics 
Metahle  from  Apple),  and  WMF/EMF  (Windows  Metahle  and  En¬ 
hanced  Metahle  from  Microsoft).  Most  of  these  formats  can  con¬ 
tain  both  vector  data  and  raster  images  in  the  same  hie.  The  PS 
(PostScript)  and  EPS  (Encapsulated  PostScript)  formats  from  Adobe 
as  well  as  the  PDF  (Portable  Document  Format)  also  offer  this  possi¬ 
bility,  although  they  are  typically  used  for  printer  output  and  archival 
purposes.6 

1.5.2  Tagged  Image  File  Format  (TIFF) 

This  is  a  widely  used  and  hexible  hie  format  designed  to  meet  the  pro¬ 
fessional  needs  of  diverse  helds.  It  was  originally  developed  by  Aldus 
and  later  extended  by  Microsoft  and  currently  Adobe.  The  format 
supports  a  range  of  grayscale,  indexed,  and  true  color  images,  but 
also  special  image  types  with  large-depth  integer  and  floating-point 
elements.  A  TIFF  hie  can  contain  a  number  of  images  with  different 
properties.  The  TIFF  specification  provides  a  range  of  different  com¬ 
pression  methods  (LZW,  ZIP,  CCITT,  and  JPEG)  and  color  spaces, 

4  The  result  was  a  chaotic  jumble  of  incompatible  hie  formats  that  for 
a  long  time  limited  the  practical  sharing  of  images  between  research 
groups. 

5  www.w3.org/TR/SVG/. 

6  Special  variations  of  PS,  EPS,  and  PDF  hies  are  also  used  as  (editable) 
exchange  formats  for  raster  and  vector  data;  for  example,  both  Adobe’s 
Photoshop  (Photoshop-EPS)  and  Illustrator  (AI). 
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1.5  Image  File  Formats 


Fig.  1.7 

Structure  of  a  typical  TIFF 
file.  A  TIFF  file  consists  of 
a  header  and  a  linked  list  of 
image  objects,  three  in  this 
example.  Each  image  object 
consists  of  a  list  of  “tags”  with 
their  corresponding  entries 
followed  by  a  pointer  to  the 
actual  image  data. 


so  that  it  is  possible,  for  example,  to  store  a  number  of  variations 
of  an  image  in  different  sizes  and  representations  together  in  a  single 
TIFF  file.  The  flexibility  of  TIFF  has  made  it  an  almost  universal  ex¬ 
change  format  that  is  widely  used  in  archiving  documents,  scientific 
applications,  digital  photography,  and  digital  video  production. 

The  strength  of  this  image  format  lies  within  its  architecture  (Fig. 
1.7),  which  enables  new  image  types  and  information  blocks  to  be  cre¬ 
ated  by  defining  new  “tags”.  In  this  flexibility  also  lies  the  weakness  of 
the  format,  namely  that  proprietary  tags  are  not  always  supported 
and  so  the  “unsupported  tag”  error  is  sometimes  still  encountered 
when  loading  TIFF  files.  Image J  also  reads  only  a  few  uncompressed 
variations  of  TIFF  formats,* 7  and  bear  in  mind  that  most  popular 
Web  browsers  currently  do  not  support  TIFF  either. 

1.5.3  Graphics  Interchange  Format  (GIF) 

The  Graphics  Interchange  Format  (GIF)  was  originally  designed  by 
CompuServe  in  1986  to  efficiently  encode  the  rich  line  graphics  used 
in  their  dial-up  Bulletin  Board  System  (BBS).  It  has  since  grown 
into  one  of  the  most  widely  used  formats  for  representing  images 
on  the  Web.  This  popularity  is  largely  due  to  its  early  support  for 
indexed  color  at  multiple  bit  depths,  LZW8  compression,  interlaced 
image  loading,  and  ability  to  encode  simple  animations  by  storing 
a  number  of  images  in  a  single  hie  for  later  sequential  display.  GIF 
is  essentially  an  indexed  image  hie  format  designed  for  color  and 
grayscale  images  with  a  maximum  depth  of  8  bits  and  consequently 
it  does  not  support  true  color  images.  It  offers  efficient  support  for 
encoding  palettes  containing  from  2  to  256  colors,  one  of  which  can 
be  marked  for  transparency.  GIF  supports  color  tables  in  the  range 

rj 

The  ImagelO  plugin  offers  support  for  a  wider  range  of  TIFF  formats. 

8  Lempel-Ziv- Welch 
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1  Digital  Images  of  2, ,  256,  enabling  pixels  to  be  encoded  using  fewer  bits.  As  an 

example,  the  pixels  of  an  image  using  16  unique  colors  require  only  4 
bits  to  store  the  16  possible  color  values  0, . . . ,  15.  This  means  that 
instead  of  storing  each  pixel  using  1  byte,  as  done  in  other  bitmap 
formats,  GIF  can  encode  two  4-bit  pixels  into  each  8-bit  byte.  This 
results  in  a  50%  storage  reduction  over  the  standard  8-bit  indexed 
color  bitmap  format. 

The  GIF  file  format  is  designed  to  efficiently  encode  “flat”  or 
“iconic”  images  consisting  of  large  areas  of  the  same  color.  It  uses 
lossy  color  quantization  (see  Ch.  13)  as  well  as  lossless  LZW  compres¬ 
sion  to  efficiently  encode  large  areas  of  the  same  color.  Despite  the 
popularity  of  the  format,  when  developing  new  software,  the  PNG9 
format,  presented  in  the  next  sub-section,  should  be  preferred,  as  it 
outperforms  GIF  by  almost  every  metric. 

1.5.4  Portable  Network  Graphics  (PNG) 

PNG  (pronounced  “ping”)  was  originally  developed  as  a  replacement 
for  the  GIF  hie  format  when  licensing  issues10  arose  because  of  its  use 
of  LZW  compression.  It  was  designed  as  a  universal  image  format 
especially  for  use  on  the  Internet,  and,  as  such,  PNG  supports  three 
different  types  of  images: 

•  true  color  images  (with  up  to  3  x  16  bits/pixel), 

•  grayscale  images  (with  up  to  16  bits/pixel), 

•  indexed  color  images  (with  up  to  256  colors). 

Additionally,  PNG  includes  an  alpha  channel  for  transparency  with  a 
maximum  depth  of  16  bits.  In  comparison,  the  transparency  channel 
of  a  GIF  image  is  only  a  single  bit  deep.  While  the  format  only  sup¬ 
ports  a  single  image  per  hie,  it  is  exceptional  in  that  it  allows  images 
of  up  to  230  x  230  pixels.  The  format  supports  lossless  compression 
by  means  of  a  variation  of  PKZIP  (Phil  Katz’s  ZIP).  No  lossy  com¬ 
pression  is  available,  as  PNG  was  not  designed  as  a  replacement  for 
JPEG.  Ultimately,  the  PNG  format  meets  or  exceeds  the  capabilities 
of  the  GIF  format  in  every  way  except  GIF’s  ability  to  include  mul¬ 
tiple  images  in  a  single  hie  to  create  simple  animations.  Currently, 
PNG  should  be  considered  the  format  of  choice  for  representing  un¬ 
compressed,  lossless,  true  color  images  for  use  on  the  Web. 

1.5.5  JPEG 

The  JPEG  standard  defines  a  compression  method  for  continuous 
grayscale  and  color  images,  such  as  those  that  would  arise  from  nature 
photography.  The  format  was  developed  by  the  Joint  Photographic 
Experts  Group  (JPEG)11  with  the  goal  of  achieving  an  average  data 
reduction  of  a  factor  of  1:16  and  was  established  in  1990  as  ISO  Stan¬ 
dard  IS- 109 18.  Today  it  is  the  most  widely  used  image  hie  format.  In 
practice,  JPEG  achieves,  depending  on  the  application,  compression 
in  the  order  of  1  bit  per  pixel  (i.e.,  a  compression  factor  of  around 

9  Portable  network  graphics 

10  Unisys’s  U.S.  LZW  Patent  No.  4,558,302  expired  on  June  20,  2003. 

11  www.jpeg.org. 
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1:25)  when  compressing  24-bit  color  images  to  an  acceptable  quality  ^  5  iMAGE  pILE  Formats 
for  viewing.  The  JPEG  standard  supports  images  with  up  to  256 
color  components,  and  what  has  become  increasingly  important  is 
its  support  for  CMYK  images  (see  Sec.  12.2.5). 

The  modular  design  of  the  JPEG  compression  algorithm  [163] 
allows  for  variations  of  the  “baseline”  algorithm;  for  example,  there 
exists  an  uncompressed  version,  though  it  is  not  often  used.  In  the 
case  of  RGB  images,  the  core  of  the  algorithm  consists  of  three  main 
steps: 

1.  Color  conversion  and  down  sampling:  A  color  transforma¬ 
tion  from  RGB  into  the  YChCr  space  (see  Ch.  12,  Sec.  12.2.4)  is 
used  to  separate  the  actual  color  components  from  the  brightness 
Y  component.  Since  the  human  visual  system  is  less  sensitive  to 
rapid  changes  in  color,  it  is  possible  to  compress  the  color  com¬ 
ponents  more,  resulting  in  a  significant  data  reduction,  without 
a  subjective  loss  in  image  quality. 

2.  Cosine  transform  and  quantization  in  frequency  space: 

The  image  is  divided  up  into  a  regular  grid  of  8  blocks,  and  for 
each  independent  block,  the  frequency  spectrum  is  computed  us¬ 
ing  the  discrete  cosine  transformation  (see  Ch.  20).  Next,  the  64 
spectral  coefficients  of  each  block  are  quantized  into  a  quantiza¬ 
tion  table.  The  size  of  this  table  largely  determines  the  eventual 
compression  ratio,  and  therefore  the  visual  quality,  of  the  image. 

In  general,  the  high  frequency  coefficients,  which  are  essential 
for  the  “sharpness”  of  the  image,  are  reduced  most  during  this 
step.  During  decompression  these  high  frequency  values  will  be 
approximated  by  computed  values. 

3.  Lossless  compression:  Finally,  the  quantized  spectral  compo¬ 
nents  data  stream  is  again  compressed  using  a  lossless  method, 
such  as  arithmetic  or  Huffman  encoding,  in  order  to  remove  the 
last  remaining  redundancy  in  the  data  stream. 

The  JPEG  compression  method  combines  a  number  of  different  com¬ 
pression  methods  and  its  should  not  be  underestimated.  Implement¬ 
ing  even  the  baseline  version  is  nontrivial,  so  application  support  for 
JPEG  increased  sharply  once  the  Independent  JPEG  Group  (IJG)12 
made  available  a  reference  implementation  of  the  JPEG  algorithm 
in  1991.  Drawbacks  of  the  JPEG  compression  algorithm  include  its 
limitation  to  8-bit  images,  its  poor  performance  on  non-photographic 
images  such  as  line  art  (for  which  it  was  not  designed),  its  handling  of 
abrupt  transitions  within  an  image,  and  the  striking  artifacts  caused 
by  the  8x8  pixel  blocks  at  high  compression  rates.  Figure  1.9  shows 
the  results  of  compressing  a  section  of  a  grayscale  image  using  differ¬ 
ent  quality  factors  (Photoshop  QJPG  =  10,5, 1). 

JPEG  File  Interchange  Format  (JFIF) 

Despite  common  usage,  JPEG  is  not  a  hie  format;  it  is  “only”  a 
method  of  compressing  image  data.  The  actual  JPEG  standard  only 
specifies  the  JPEG  codec  (compressor  and  decompressor)  and  by  de- 


12 


www.ijg.org. 
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1  Digital  Images 


Fig.  1.8 

JPEG  compression  of  an  RGB 
image.  Using  a  color  space 
transformation,  the  color  com¬ 
ponents  Cb,  Cr  are  separated 
from  the  Y  luminance  com¬ 
ponent  and  subjected  to  a 
higher  rate  of  compression. 
Each  of  the  three  components 
are  then  run  independently 
through  the  JPEG  compression 
pipeline  and  are  merged  into 
a  single  JPEG  data  stream. 
Decompression  follows  the 
same  stages  in  reverse  order. 
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RGB 


sign  leaves  the  wrapping,  or  file  format,  undefined.13  What  is  nor¬ 
mally  referred  to  as  a  JPEG  file  is  almost  always  an  instance  of  a 
“JPEG  File  Interchange  Format”  (JFIF)  file,  originally  developed  by 
Eric  Hamilton  and  the  IJG.  JFIF  specifies  a  file  format  based  on  the 
JPEG  standard  by  defining  the  remaining  necessary  elements  of  a  file 
format.  The  JPEG  standard  leaves  some  parts  of  the  codec  unde¬ 
fined  for  generality,  and  in  these  cases  JFIF  makes  a  specific  choice. 
As  an  example,  in  step  1  of  the  JPEG  codec,  the  specific  color  space 
used  in  the  color  transformation  is  not  part  of  the  JPEG  standard, 
so  it  is  specified  by  the  JFIF  standard.  As  such,  the  use  of  different 
compression  ratios  for  color  and  luminance  is  a  practical  implementa¬ 
tion  decision  specified  by  JFIF  and  is  not  a  part  of  the  actual  JPEG 
encoder. 

Exchangeable  Image  File  Format  (EXIF) 

The  Exchangeable  Image  File  Format  (EXIF)  is  a  variant  of  the 
JPEG  (JFIF)  format  designed  for  storing  image  data  originating 
on  digital  cameras,  and  to  that  end  it  supports  storing  metadata 
such  as  the  type  of  camera,  date  and  time,  photographic  parameters 
such  as  aperture  and  exposure  time,  as  well  as  geographical  (GPS) 
data.  EXIF  was  developed  by  the  Japan  Electronics  and  Information 
Technology  Industries  Association  (JEITA)  as  a  part  of  the  DCF14 
guidelines  and  is  used  today  by  practically  all  manufacturers  as  the 
standard  format  for  storing  digital  images  on  memory  cards.  Inter¬ 
nally,  EXIF  uses  TIFF  to  store  the  metadata  information  and  JPEG 
to  encode  a  thumbnail  preview  image.  The  file  structure  is  designed 
so  that  it  can  be  processed  by  existing  JPEG/ JFIF  readers  without 
a  problem. 

JPEG-2000 

JPEG-2000,  which  is  specified  by  an  ISO-ITU  standard  (“Coding 
of  Still  Pictures”),15  was  designed  to  overcome  some  of  the  better- 
known  weaknesses  of  the  traditional  JPEG  codec.  Among  the  im- 

-1  Q 

To  be  exact,  the  JPEG  standard  only  defines  how  to  compress  the  in¬ 
dividual  components  and  the  structure  of  the  JPEG  stream. 

14  Design  Rule  for  Camera  File  System. 

15  www.jpeg.org/JPEG2000.htm. 


1.5  Image  File  Formats 


(a)  Original 
(75.08  kB) 


Fig.  1.9 

Artifacts  arising  from  JPEG 
compression.  A  section  of  the 
original  image  (a)  and  the  re¬ 
sults  of  JPEG  compression 
at  different  quality  factors: 

Qjpg  =  10  (t>),  Qjpg  =  5 
(c),  and  QjPG  =  1(d).  In 
parentheses  are  the  resulting 
file  sizes  for  the  complete  (di¬ 
mensions  274  x  274)  image. 


provements  made  in  JPEG-2000  are  the  use  of  larger,  64  x  64  pixel 
blocks  and  replacement  of  the  discrete  cosine  transform  by  the  wavelet 
transform.  These  and  other  improvements  enable  it  to  achieve  sig¬ 
nificantly  higher  compression  ratios  than  JPEG — up  to  0.25  bits  per 
pixel  on  RGB  color  images.  Despite  these  advantages,  JPEG-2000 
is  supported  by  only  a  few  image-processing  applications  and  Web 
browsers.16 
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At  this  time,  Image J  does  not  offer  JPEG-2000  support. 
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1  Digital  Images 


Fig.  1.10 

Example  of  a  PGM  file  in 
human-readable  text  format 
(top)  and  the  correspond¬ 
ing  grayscale  image  (below). 


1.5.6  Windows  Bitmap  (BMP) 

The  Windows  Bitmap  (BMP)  format  is  a  simple,  and  under  Win¬ 
dows  widely  used,  file  format  supporting  grayscale,  indexed,  and  true 
color  images.  It  also  supports  binary  images,  but  not  in  an  efficient 
manner,  since  each  pixel  is  stored  using  an  entire  byte.  Optionally, 
the  format  supports  simple  lossless,  run-length-based  compression. 
While  BMP  offers  storage  for  a  similar  range  of  image  types  as  TIFF, 
it  is  a  much  less  flexible  format. 

1.5.7  Portable  Bitmap  Format  (PBM) 

The  Portable  Bitmap  Format  (PBM)  family17  consists  of  a  series 
of  very  simple  hie  formats  that  are  exceptional  in  that  they  can  be 
optionally  saved  in  a  human-readable  text  format  that  can  be  easily 
read  in  a  program  or  simply  edited  using  a  text  editor.  A  simple 
PGM  image  is  shown  in  Fig.  1.10.  The  characters  P2  in  the  first 
line  indicate  that  the  image  is  a  PGM  (“plain”)  hie  stored  in  human- 
readable  format.  The  next  line  shows  how  comments  can  be  inserted 
directly  into  the  hie  by  beginning  the  line  with  the  #  symbol.  Line 
three  gives  the  image’s  dimensions,  in  this  case  width  17  and  height 
7,  and  line  four  dehnes  the  maximum  pixel  value,  in  this  case  255. 
The  remaining  lines  give  the  actual  pixel  values.  This  format  makes 
it  easy  to  create  and  store  image  data  without  any  explicit  imaging 
API,  since  it  requires  only  basic  text  I/O  that  is  available  in  any 
programming  environment.  In  addition,  the  format  supports  a  much 
more  machine-optimized  “raw”  output  mode  in  which  pixel  values 
are  stored  as  bytes.  PBM  is  widely  used  under  Unix  and  supports 
the  following  formats:  PBM  ( portable  bitmap )  for  binary  bitmaps , 
PGM  ( portable  graymap)  for  grayscale  images,  and  PNM  ( portable 
any  map)  for  color  images.  PGM  images  can  be  opened  by  ImageJ. 
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1.5.8  Additional  File  Formats 

For  most  practical  applications,  one  of  the  following  hie  formats  is 
sufficient:  TIFF  as  a  universal  format  supporting  a  wide  variety  of 
uncompressed  images  and  JPEG/JFIF  for  digital  color  photos  when 
storage  size  is  a  concern,  and  there  is  either  PNG  or  GIF  for  when 
an  image  is  destined  for  use  on  the  Web.  In  addition,  there  exist 
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http://netpbm.sourceforge.net. 


countless  other  file  formats,  such  as  those  encountered  in  legacy  ap-  ^  5  iMAGE  pILE  Formats 
plications  or  in  special  application  areas  where  they  are  traditionally 
used.  A  few  of  the  more  commonly  encountered  types  are: 

•  RGB,  a  simple  format  from  Silicon  Graphics. 

•  RAS  (Sun  Raster  Format),  a  simple  format  from  Sun  Micro¬ 
systems. 

•  TGA  (Truevision  Targa  File  Format),  the  first  24-bit  file  format 
for  PCs.  It  supports  numerous  image  types  with  8-  to  32-bit 
depths  and  is  still  used  in  medicine  and  biology. 

•  XBM/XPM  (X- Windows  Bitmap/Pixmap),  a  group  of  ASCII- 
encoded  formats  used  in  the  X- Windows  system  and  similar  to 
PBM/PGM. 

1.5.9  Bits  and  Bytes 

Today,  opening,  reading,  and  writing  image  files  is  mostly  carried  out 
by  means  of  existing  software  libraries.  Yet  sometimes  you  still  need 
to  deal  with  the  structure  and  contents  of  an  image  hie  at  the  byte 
level,  for  instance  when  you  need  to  read  an  unsupported  hie  format 
or  when  you  receive  a  hie  where  the  format  of  the  data  is  unknown. 

Big  endian  and  little  endian 

In  the  standard  model  of  a  computer,  a  hie  consists  of  a  simple  se¬ 
quence  of  8-bit  bytes,  and  a  byte  is  the  smallest  entry  that  can  be 
read  or  written  to  a  hie.  In  contrast,  the  image  elements  as  they  are 
stored  in  memory  are  usually  larger  then  a  byte;  for  example,  a  32-bit 
int  value  (=  4  bytes)  is  used  for  an  RGB  color  pixel.  The  problem  is 
that  storing  the  four  individual  bytes  that  make  up  the  image  data 
can  be  done  in  different  ways.  In  order  to  correctly  recreate  the  orig¬ 
inal  color  pixel,  we  must  naturally  know  the  order  in  which  bytes  in 
the  hie  are  arranged. 

Consider,  for  example,  a  32-bit  int  number  z  with  the  binary  and 
hexadecimal  values18 

2  =  00010010001101000101011001111000#  =  12345678#,  (1.2) 

12  H  78h 

(MSB)  (LSB) 

then  00010010#  =  12#  is  the  value  of  the  most  significant  byte  (MSB) 
and  01111000#  =  78#  the  least  significant  byte  (LSB).  When  the 
individual  bytes  in  the  hie  are  arranged  in  order  from  MSB  to  LSB 
when  they  are  saved,  we  call  the  ordering  “big  endian”,  and  when  in 
the  opposite  direction,  “little  endian”.  Thus  the  32-bit  value  z  from 
Eqn.  (1.2)  could  be  stored  in  one  of  the  following  two  modes: 


Ordering 

Byte  Sequence 

1 

2 

3 

4 

big  endian 

MSB  ->■  LSB 

12# 

34  H 

56  H 

OO 

little  endian 

LSB  ->■  MSB 

OO 

56# 

UH 

12# 

Even  though  correctly  ordering  the  bytes  should  essentially  be  the 
responsibility  of  the  operating  and  hie  systems,  in  practice  it  actually 


18 


The  decimal  value  of  z  is  305419896. 
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1  Digital  Images 

Format 

Signature 

Format 

Signature 

Table  1.2 

PNG 

0x89504e47 

□PNG 

BMP 

0x424d 

BM 

Signatures  of  various  image 
file  formats.  Most  image  file 
formats  can  be  identified  by 

JPEG/JFIF 

TIFFiittle 

Oxf f d8f f eO 

0x49492a00 

mm 

ii*D 

GIF 

Photoshop 

0x4749463839 

0x38425053 

GIF89 

8BPS 

inspecting  the  first  bytes  of 
the  file.  These  byte  sequences, 

TIFFbig 

0x4d4d002a 

MMD* 

PS/EPS 

0x25215053 

"/. !  PS 

or  signatures,  are  listed  in 


hexadecimal  (Ox.  .)  form  and 
as  ASCII  text  (D  indicates 
a  nonprintable  character). 

depends  on  the  architecture  of  the  processor.19  Processors  from  the 
Intel  family  (e.g.,  x86,  Pentium)  are  traditionally  little  endian,  and 
processors  from  other  manufacturers  (e.g.,  IBM,  MIPS,  Motorola, 
Sun)  are  big  endian.20  Big  endian  is  also  called  network  byte  ordering 
since  in  the  IP  protocol  the  data  bytes  are  arranged  in  MSB  to  LSB 
order  during  transmission. 

To  correctly  interpret  image  data  with  multi-byte  pixel  values, 
it  is  necessary  to  know  the  byte  ordering  used  when  creating  it.  In 
most  cases,  this  is  fixed  and  defined  by  the  file  format,  but  in  some  file 
formats,  for  example  TIFF,  it  is  variable  and  depends  on  a  parameter 
given  in  the  file  header  (see  Table  1.2). 

File  headers  and  signatures 

Practically  all  image  file  formats  contain  a  data  header  consisting 
of  important  information  about  the  layout  of  the  image  data  that 
follows.  Values  such  as  the  size  of  the  image  and  the  encoding  of 
the  pixels  are  usually  present  in  the  file  header  to  make  it  easier 
for  programmers  to  allocate  the  correct  amount  of  memory  for  the 
image.  The  size  and  structure  of  this  header  are  usually  fixed,  but 
in  some  formats,  such  as  TIFF,  the  header  can  contain  pointers  to 
additional  subheaders. 

In  order  to  interpret  the  information  in  the  header,  it  is  necessary 
to  know  the  file  type.  In  many  cases,  this  can  be  determined  by  the 
file  name  extension  (e.g.,  .jpg  or  .tif),  but  since  these  extensions 
are  not  standardized  and  can  be  changed  at  any  time  by  the  user,  they 
are  not  a  reliable  way  of  determining  the  file  type.  Instead,  many  file 
types  can  be  identified  by  their  embedded  “signature”,  which  is  often 
the  first  2  bytes  of  the  file.  Signatures  from  a  number  of  popular 
image  formats  are  given  in  Table  1.2.  Most  image  formats  can  be 
determined  by  inspecting  the  first  few  bytes  of  the  file.  These  bytes, 
or  signatures,  are  listed  in  hexadecimal  (Ox .  . )  form  and  as  ASCII 
text.  A  PNG  file  always  begins  with  the  Tbyte  sequence  0x89,  0x50, 
0x4e,  0x47,  which  is  the  “magic  number”  0x89  followed  by  the  ASCII 
sequence  “PNG”.  Sometimes  the  signature  not  only  identifies  the  type 
of  image  file  but  also  contains  information  about  its  encoding;  for 
instance,  in  TIFF  the  first  two  characters  are  either  II  for  “Intel”  or 
MM  for  “Motorola”  and  indicate  the  byte  ordering  (little  endian  or  big 
endian,  respectively)  of  the  image  data  in  the  file. 

19  At  least  the  ordering  of  the  bits  within  a  byte  is  almost  universally 
uniform. 

20  In  Java,  this  problem  does  not  arise  since  internally  all  implementations 
of  the  Java  Virtual  Machine  use  big  endian  ordering. 
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1.6  Exercises 


1.6  Exercises 


Exercise  1.1.  Determine  the  actual  physical  measurement  in  mil¬ 
limeters  of  an  image  with  1400  rectangular  pixels  and  a  resolution  of 
72  dpi. 

Exercise  1.2.  A  camera  with  a  focal  length  of  /  =  50  mm  is  used 
to  take  a  photo  of  a  vertical  column  that  is  12  m  high  and  is  95  m 
away  from  the  camera.  Determine  its  height  in  the  image  in  mm  (a) 
and  the  number  of  pixels  (b)  assuming  the  camera  has  a  resolution 
of  4000  dpi. 

Exercise  1.3.  The  image  sensor  of  a  particular  digital  camera  con¬ 
tains  2016  x  3024  pixels.  The  geometry  of  this  sensor  is  identical  to 
that  of  a  traditional  35  mm  camera  (with  an  image  size  of  24  x  36 
mm)  except  that  it  is  1.6  times  smaller.  Compute  the  resolution  of 
this  digital  sensor  in  dpi. 

Exercise  1.4.  Assume  the  camera  geometry  described  in  Exercise 
1.3  combined  with  a  lens  with  focal  length  /  =  50  mm.  What  amount 
of  blurring  (in  pixels)  would  be  caused  by  a  uniform,  0.1°  horizontal 
turn  of  the  camera  during  exposure?  Recompute  this  for  /  =  300 
mm.  Consider  if  the  extent  of  the  blurring  also  depends  on  the  dis¬ 
tance  of  the  object. 

Exercise  1.5.  Determine  the  number  of  bytes  necessary  to  store  an 
uncompressed  binary  image  of  size  4000  x  3000  pixels. 

Exercise  1.6.  Determine  the  number  of  bytes  necessary  to  store  an 
uncompressed  RGB  color  image  of  size  640  x  480  pixels  using  8,  10, 
12,  and  14  bits  per  color  channel. 

Exercise  1.7.  Given  a  black  and  white  television  with  a  resolution 
of  625  x  512  8-bit  pixels  and  a  frame  rate  of  25  images  per  second: 
(a)  How  may  different  images  can  this  device  ultimately  display,  and 
how  long  would  you  have  to  watch  it  (assuming  no  sleeping)  in  order 
to  see  every  possible  image  at  least  once?  (b)  Perform  the  same 
calculation  for  a  color  television  with  3x8  bits  per  pixel. 

Exercise  1.8.  Show  that  the  projection  of  a  3D  straight  line  in  a 
pinhole  camera  (assuming  perspective  projection  as  defined  in  Eqn. 
(1.1))  is  again  a  straight  line  in  the  resulting  2D  image. 

Exercise  1.9.  Using  Fig.  1.10  as  a  model,  use  a  text  editor  to  create 
a  PGM  hie,  disk.pgm,  containing  an  image  of  a  bright  circle.  Open 
your  image  with  ImageJ  and  then  try  to  find  other  programs  that 
can  open  and  display  the  image. 
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ImageJ 


Until  a  few  years  ago,  the  image-processing  community  was  a  rel¬ 
atively  small  group  of  people  who  either  had  access  to  expensive 
commercial  image-processing  tools  or,  out  of  necessity,  developed 
their  own  software  packages.  Usually  such  home-brew  environments 
started  out  with  small  software  components  for  loading  and  storing 
images  from  and  to  disk  files.  This  was  not  always  easy  because  of¬ 
ten  one  had  to  deal  with  poorly  documented  or  even  proprietary  file 
formats.  An  obvious  (and  frequent)  solution  was  to  simply  design  a 
new  image  hie  format  from  scratch,  usually  optimized  for  a  partic¬ 
ular  held,  application,  or  even  a  single  project,  which  naturally  led 
to  a  myriad  of  different  hie  formats,  many  of  which  did  not  survive 
and  are  forgotten  today  [163, 168].  Nevertheless,  writing  software 
for  converting  between  all  these  hie  formats  in  the  1980s  and  early 
1990s  was  an  important  business  that  occupied  many  people.  Dis¬ 
playing  images  on  computer  screens  was  similarly  difficult,  because 
there  was  only  marginal  support  from  operating  systems,  APIs,  and 
display  hardware,  and  capturing  images  or  videos  into  a  computer 
was  close  to  impossible  on  common  hardware.  It  thus  may  have 
taken  many  weeks  or  even  months  before  one  could  do  just  elemen¬ 
tary  things  with  images  on  a  computer  and  finally  do  some  serious 
image  processing. 

Fortunately,  the  situation  is  much  different  today.  Only  a  few 
common  image  hie  formats  have  survived  (see  also  Sec.  1.5),  which  are 
readily  handled  by  many  existing  tools  and  software  libraries.  Most 
standard  APIs  for  C/C++,  Java,  and  other  popular  programming 
languages  already  come  with  at  least  some  basic  support  for  working 
with  images  and  other  types  of  media  data.  While  there  is  still  much 
development  work  going  on  at  this  level,  it  makes  our  job  a  lot  easier 
and,  in  particular,  allows  us  to  focus  on  the  more  interesting  aspects 
of  digital  imaging. 
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2  ImageJ 


2.1  Software  for  Digital  Imaging 


Traditionally,  software  for  digital  imaging  has  been  targeted  at  ei¬ 
ther  manipulating  or  processing  images,  either  for  practitioners  and 
designers  or  software  programmers,  with  quite  different  requirements. 

Software  packages  for  manipulating  images,  such  as  Adobe  Photo¬ 
shop,  Corel  Paint,  and  others,  usually  offer  a  convenient  user  interface 
and  a  large  number  of  readily  available  functions  and  tools  for  work¬ 
ing  with  images  interactively.  Sometimes  it  is  possible  to  extend  the 
standard  functionality  by  writing  scripts  or  adding  self-programmed 
components.  For  example,  Adobe  provides  a  special  API1  for  pro¬ 
gramming  Photoshop  “plugins”  in  C++,  though  this  is  a  nontrivial 
task  and  certainly  too  complex  for  nonprogrammers. 

In  contrast  to  the  aforementioned  category  of  tools,  digital  im¬ 
age  processing  software  primarily  aims  at  the  requirements  of  al¬ 
gorithm  and  software  developers,  scientists,  and  engineers  working 
with  images,  where  interactivity  and  ease  of  use  are  not  the  main 
concerns.  Instead,  these  environments  mostly  offer  comprehensive 
and  well-documented  software  libraries  that  facilitate  the  implemen¬ 
tation  of  new  image-processing  algorithms,  prototypes,  and  work¬ 
ing  applications.  Popular  examples  are  Khoros/ AccuSoft, 2  MatLab,3 
ImageMagick,4  among  many  others.  In  addition  to  the  support  for 
conventional  programming  (typically  with  C/C++),  many  of  these 
systems  provide  dedicated  scripting  languages  or  visual  programming 
aides  that  can  be  used  to  construct  even  highly  complex  processes  in 
a  convenient  and  safe  fashion. 

In  practice,  image  manipulation  and  image  processing  are  of 
course  closely  related.  Although  Photoshop,  for  example,  is  aimed 
at  image  manipulation  by  nonprogrammers,  the  software  itself  im¬ 
plements  many  traditional  image-processing  algorithms.  The  same  is 
true  for  many  Web  applications  using  server-side  image  processing, 
such  as  those  based  on  ImageMagick.  Thus  image  processing  is  really 
at  the  base  of  any  image  manipulation  software  and  certainly  not  an 
entirely  different  category. 


2.2  ImageJ  Overview 

ImageJ,  the  software  that  is  used  for  this  book,  is  a  combination 
of  both  worlds  discussed  in  the  previous  section.  It  offers  a  set  of 
ready-made  tools  for  viewing  and  interactive  manipulation  of  images 
but  can  also  be  extended  easily  by  writing  new  software  components 
in  a  “real”  programming  language.  ImageJ  is  implemented  entirely 
in  Java  and  is  thus  largely  platform-independent,  running  without 
modification  under  Windows,  MacOS,  or  Linux.  Java’s  dynamic  ex¬ 
ecution  model  allows  new  modules  (“plugins”)  to  be  written  as  in¬ 
dependent  pieces  of  Java  code  that  can  be  compiled,  loaded,  and 
executed  “on  the  fly”  in  the  running  system  without  the  need  to 

1  www.adobe.com/products/photoshop/. 

2  www.accusoft.com. 

3  www.mathworks.com. 

4  www.imagemagick.org. 
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even  restart  Image J.  This  quick  turnaround  makes  Image J  an  ideal 
platform  for  developing  and  testing  new  image-processing  techniques 
and  algorithms.  Since  Java  has  become  extremely  popular  as  a  first 
programming  language  in  many  engineering  curricula,  it  is  usually 
quite  easy  for  students  to  get  started  in  ImageJ  without  having  to 
spend  much  time  learning  another  programming  language.  Also,  Im¬ 
ageJ  is  freely  available,  so  students,  instructors,  and  practitioners 
can  install  and  use  the  software  legally  and  without  license  charges 
on  any  computer.  ImageJ  is  thus  an  ideal  platform  for  education  and 
self-training  in  digital  image  processing  but  is  also  in  regular  use  for 
serious  research  and  application  development  at  many  laboratories 
around  the  world,  particularly  in  biological  and  medical  imaging. 

ImageJ  was  (and  still  is)  developed  by  Wayne  Rasband  [193]  at 
the  U.S.  National  Institutes  of  Health  (NIH),  originally  as  a  sub¬ 
stitute  for  its  predecessor,  NIH-Image,  which  was  only  available  for 
the  Apple  Macintosh  platform.  The  current  version  of  ImageJ,  up¬ 
dates,  documentation,  the  complete  source  code,  test  images,  and  a 
continuously  growing  collection  of  third-party  plugins  can  be  down¬ 
loaded  from  the  ImageJ  website.5  Installation  is  simple,  with  detailed 
instructions  available  online,  in  Werner  Bailer’s  programming  tuto¬ 
rial  [12],  and  in  the  authors’  ImageJ  Short  Reference  [40]. 

In  addition  to  ImageJ  itself  there  are  several  popular  software 
projects  that  build  on  or  extend  ImageJ.  This  includes  in  particular 
Fiji6 7  (“Fiji  Is  Just  ImageJ”)  which  offers  a  consistent  collection  of 
numerous  plugins,  simple  installation  on  various  platforms  and  ex¬ 
cellent  documentation.  All  programming  examples  (plugins)  shown 
in  this  book  should  also  execute  in  Fiji  without  any  modifications. 
Another  important  development  is  ImgLib2 ,  which  is  a  generic  Java 
API  for  representing  and  processing  n-dimensional  images  in  a  con¬ 
sistent  fashion.  ImgLib2  also  provides  the  underlying  data  model  for 
ImageJ 2j  which  is  a  complete  reimplementation  of  ImageJ. 


2.2  ImageJ  Overview 


Wayne  Rasband  (right)  at  the 
1st  ImageJ  Conference  2006 
(picture  courtesy  of  Marc  Seil, 
CRP  Henri  Tudor, 
Luxembourg). 


2.2.1  Key  Features 

As  a  pure  Java  application,  ImageJ  should  run  on  any  computer 
for  which  a  current  Java  runtime  environment  (JRE)  exists.  Im¬ 
ageJ  comes  with  its  own  Java  runtime,  so  Java  need  not  be  installed 
separately  on  the  computer.  Under  the  usual  restrictions,  ImageJ  can 
be  run  as  a  Java  “applet”  within  a  Web  browser,  though  it  is  mostly 
used  as  a  stand-alone  application.  It  is  sometimes  also  used  on  the 
server  side  in  the  context  of  Java-based  Web  applications  (see  [12] 
for  details).  In  summary,  the  key  features  of  ImageJ  are: 

•  A  set  of  ready-to-use,  interactive  tools  for  creating,  visualizing, 
editing,  processing,  analyzing,  loading,  and  storing  images,  with 
support  for  several  common  file  formats.  ImageJ  also  provides 
“deep”  16-bit  integer  images,  32-bit  floating-point  images,  and 
image  sequences  (“stacks”). 

5  http://rsb.info.nih.gov/ij/. 

6  http://fiji.sc. 

7  http://imagej.net/lmageJ2.  To  avoid  confusion,  the  “classic”  ImageJ  plat¬ 
form  is  sometimes  referred  to  as  “Image Jl”  or  simply  “IJ1”. 
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•  A  simple  plugin  mechanism  for  extending  the  core  functionality 
of  ImageJ  by  writing  (usually  small)  pieces  of  Java  code.  All 
coding  examples  shown  in  this  book  are  based  on  such  plugins. 

•  A  macro  language  and  the  corresponding  interpreter,  which  make 
it  easy  to  implement  larger  processing  blocks  by  combining  ex¬ 
isting  functions  without  any  knowledge  of  Java.  Macros  are  not 
discussed  in  this  book,  but  details  can  be  found  in  Image J’s  online 
documentation.8 

2.2.2  Interactive  Tools 

When  ImageJ  starts  up,  it  first  opens  its  main  window  (Fig.  2.1), 
which  includes  the  following  menu  entries: 

•  File:  for  opening,  saving,  and  creating  new  images. 

•  Edit:  for  editing  and  drawing  in  images. 

•  Image:  for  modifying  and  converting  images,  geometric  opera¬ 
tions. 

•  Process:  for  image  processing,  including  point  operations,  filters, 
and  arithmetic  operations  between  multiple  images. 

•  Analyze:  for  statistical  measurements  on  image  data,  histograms, 
and  special  display  formats. 

•  Plugin:  for  editing,  compiling,  executing,  and  managing  user- 
defined  plugins. 

The  current  version  of  ImageJ  can  open  images  in  several  common 
formats,  including  TIFF  (uncompressed  only),  JPEG,  GIF,  PNG, 
and  BMP,  as  well  as  the  formats  DICOM9  and  FITS,10  which  are 
popular  in  medical  and  astronomical  image  processing,  respectively. 
As  is  common  in  most  image-editing  programs,  all  interactive  oper¬ 
ations  are  applied  to  the  currently  active  image,  i.e.,  the  image  most 
recently  selected  by  the  user.  ImageJ  provides  a  simple  (single-step) 
“undo”  mechanism  for  most  operations,  which  can  also  revert  modi¬ 
fications  effected  by  user-defined  plugins. 

2.2.3  ImageJ  Plugins 

Plugins  are  small  Java  modules  for  extending  the  functionality  of 
ImageJ  by  using  a  simple  standardized  interface  (Fig.  2.2).  Plugins 
can  be  created,  edited,  compiled,  invoked,  and  organized  through 
the  Plugin  menu  in  Image J’s  main  window  (Fig.  2.1).  Plugins  can 
be  grouped  to  improve  modularity,  and  plugin  commands  can  be 
arbitrarily  placed  inside  the  main  menu  structure.  Also,  many  of  Im¬ 
age  J’s  built-in  functions  are  actually  implemented  as  plugins  them¬ 
selves. 

Program  structure 

Technically  speaking,  plugins  are  Java  classes  that  implement  a  par¬ 
ticular  interface  specification  defined  by  ImageJ.  There  are  two  main 
types  of  plugins: 

8  http://rsb.info.nih.gov/ij/developer/macro/macros.html. 

9  Digital  Imaging  and  Communications  in  Medicine. 

10  Flexible  Image  Transport  System. 
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2.2  ImageJ  Overview 

Fig.  2.1 

ImageJ  main  window  (under 
Windows). 


Fig.  2.2 

ImageJ  software  structure 
(simplified).  ImageJ  is  based 
on  the  Java  core  system  and 
depends  in  particular  upon 
Java’s  Advanced  Windowing 
Toolkit  (AWT)  for  the  imple¬ 
mentation  of  the  user  interface 
and  the  presentation  of  image 
data.  Plugins  are  small  Java 
classes  that  extend  the  func¬ 
tionality  of  the  basic  ImageJ 
system. 


•  Plugin:  requires  no  image  to  be  open  to  start  a  plugin. 

•  PluglnFilter:  the  currently  active  image  is  passed  to  the  plugin 
when  started. 

Throughout  the  examples  in  this  book,  we  almost  exclusively  use  plu¬ 
gins  of  the  second  type  (i.e.,  PluglnFilter)  for  implementing  image- 
processing  operations.  The  interface  specification  requires  that  any 
plugin  of  type  PluglnFilter  must  at  least  implement  two  methods, 
setup ()  and  run(),  with  the  following  signatures: 

int  setup  (String  args,  ImagePlus  im ) 

When  the  plugin  is  started,  ImageJ  calls  this  method  first  to 
verify  that  the  capabilities  of  this  plugin  match  the  target  image, 
setup  ()  returns  a  vector  of  binary  flags  (packaged  as  a  32-bit 
int  value)  that  describes  the  plugin’s  properties. 

void  run  (ImageProcessor  ip) 

This  method  does  the  actual  work  for  this  plugin.  It  is  passed 
a  single  argument  ip,  an  object  of  type  ImageProcessor,  which 
contains  the  image  to  be  processed  and  all  relevant  information 


27 


2  ImageJ 


about  it.  The  run()  method  returns  no  result  value  (void)  but 
may  modify  the  passed  image  and  create  new  images. 

2.2.4  A  First  Example:  Inverting  an  Image 

Let  us  look  at  a  real  example  to  quickly  illustrate  this  mechanism. 
The  task  of  our  first  plugin  is  to  invert  any  8-bit  grayscale  image  to 
turn  a  positive  image  into  a  negative.  As  we  shall  see  later,  inverting 
the  intensity  of  an  image  is  a  typical  point  operation ,  which  is  dis¬ 
cussed  in  detail  in  Chapter  4.  In  ImageJ,  8-bit  grayscale  images  have 
pixel  values  ranging  from  0  (black)  to  255  (white),  and  we  assume 
that  the  width  and  height  of  the  image  are  M  and  A,  respectively. 
The  operation  is  very  simple:  the  value  of  each  image  pixel  /(iq  v)  is 
replaced  by  its  inverted  value, 

I(u,v )  <—  255  — /(rqu), 

for  all  image  coordinates  (iq  u),  with  u  =  0, . . . ,  M— 1  and  v  =  0, . . . , 
TV-1. 


2.2.5  Plugin  My_Inverter_A  (using  PluglnFilter) 

We  decide  to  name  our  first  plugin  “My_ Invert er_ A”,  which  is  both 
the  name  of  the  Java  class  and  the  name  of  the  source  file11  that 
contains  it  (see  Prog.  2.1).  The  underscore  characters  (“_”)  in  the 
name  cause  ImageJ  to  recognize  this  class  as  a  plugin  and  to  insert  it 
automatically  into  the  menu  list  at  startup.  The  Java  source  code  in 
file  My_Inverter .  java  contains  a  few  import  statements,  followed 
by  the  definition  of  the  class  My_Inverter,  which  implements  the 
PluglnFilter  interface  (because  it  will  be  applied  to  an  existing 
image). 

The  setup  ()  method 

When  a  plugin  of  type  PluglnFilter  is  executed,  ImageJ  first  in¬ 
vokes  its  setup  ()  method  to  obtain  information  about  the  plugin 
itself.  In  this  example,  setup  ()  only  returns  the  value  D0ES_8G  (a 
static  int  constant  specified  by  the  PluglnFilter  interface),  indi¬ 
cating  that  this  plugin  can  handle  8-bit  grayscale  images.  The  pa¬ 
rameters  arg  and  im  of  the  setup  ()  method  are  not  used  in  this 
example  (see  also  Exercise  2.7). 

The  run()  method 

As  mentioned  already,  the  run()  method  of  a  PluglnFilter  plugin 
receives  an  object  (ip)  of  type  ImageProcessor,  which  contains  the 
image  to  be  processed  and  all  relevant  information  about  it.  First, 
we  use  the  ImageProcessor  methods  getWidthO  and  getHeightO 
to  query  the  size  of  the  image  referenced  by  ip.  Then  we  use  two 
nested  for  loops  (with  loop  variables  u,  v  for  the  horizontal  and 
vertical  coordinates,  respectively)  to  iterate  over  all  image  pixels.  For 
reading  and  writing  the  pixel  values,  we  use  two  additional  methods 
of  the  class  ImageProcessor: 
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File  My_Inverter_A.  java. 


1  import  i j . ImagePlus ; 

2  import  ij .plugin. filter . PluglnFilter ; 

3  import  ij .process . ImageProcessor ; 

4 

5  public  class  My_Inverter_A  implements  PluglnFilter  { 

6 

7  public  int  setup (String  args ,  ImagePlus  im)  { 

8  return  D0ES_8G ;  //  this  plugin  accepts  8-bit  grayscale  images 

9  } 

10 

11  public  void  run (ImageProcessor  ip)  { 

12  int  M  =  ip  .getWidthO  ; 

13  int  N  =  ip . get Height () ; 

14 

15  //  iterate  over  all  image  coordinates  (u,v) 

16  for  (int  u  =  0;  u  <  M;  u++)  { 

17  for  (int  v  =  0;  v  <  N;  v++)  { 

18  int  p  =  ip . getPixel (u,  v) ; 

19  ip .putPixel (u,  v,  255  -  p) ; 

20  } 

21  } 

22  } 

23 

24  } 


2.2  ImageJ  Overview 

Prog.  2.1 

ImageJ  plugin  for  inverting 
8-bit  grayscale  images.  This 
plugin  implements  the  inter¬ 
face  PluglnFilter  and  defines 
the  required  methods  setup  () 
and  run().  The  target  im¬ 
age  is  received  by  the  run() 
method  as  an  instance  of  type 
ImageProcessor.  ImageJ  as¬ 
sumes  that  the  plugin  modifies 
the  supplied  image  and  auto¬ 
matically  redisplays  it  after  the 
plugin  is  executed.  Program 
2.2  shows  an  alternative  imple¬ 
mentation  that  is  based  on  the 
Plugin  interface. 


int  getPixel  (int  u,  int  v ) 

Returns  the  pixel  value  at  the  given  position  or  zero  if  (it,  v)  is 
outside  the  image  bounds, 
void  putPixel  (int  u,  int  v,  int  a) 

Sets  the  pixel  value  at  position  (it,  v )  to  the  new  value  a.  Does 
nothing  if  (it,  v )  is  outside  the  image  bounds. 

Both  methods  check  the  supplied  image  coordinates  and  pixel  val¬ 
ues  to  avoid  unwanted  errors.  While  this  makes  them  more  or  less 
fail-safe  it  also  makes  them  slow.  If  we  are  sure  that  no  coordinates 
outside  the  image  bounds  are  ever  accessed  (as  in  My _ Inverter  in 
Prog.  2.1)  and  the  inserted  pixel  values  are  guaranteed  not  to  ex¬ 
ceed  the  image  processor’s  range,  we  can  use  the  significantly  faster 
methods  get()  and  set()  in  place  of  getPixel ()  and  putPixel (), 
respectively.  The  most  efficient  way  to  process  the  image  is  to  avoid 
read/write  methods  altogether  and  directly  access  the  elements  of 
the  associated  (ID)  pixel  array.  Details  on  these  and  other  methods 
can  be  found  in  the  ImageJ  API  documentation.12 


2.2.6  Plugin  My_Inverter_B  (using  Plugin) 

Program  2.2  shows  an  alternative  implementation  of  the  inverter 
plugin  based  on  ImageJ ’s  Plugin  interface,  which  requires  a  run() 
method  only.  In  this  case  the  reference  to  the  current  image  is  not 
supplied  directly  but  is  obtained  by  invoking  the  (static)  method 


12 


http://rsbweb.nih.gov/ij/developer/api/index.html. 
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Prog.  2.2 

Alternative  implementation 
of  the  inverter  plugin,  based 
on  ImageJ’s  Plugin  interface. 
In  contrast  to  Prog.  2.1  this 
plugin  has  no  setUp  ()  method 
but  defines  a  run()  method 
only.  The  current  image  (im) 
is  obtained  as  an  instance  of 
class  ImagePlus  by  invoking 
the  I J . getlmage ()  method.  Af¬ 
ter  checking  for  the  proper 
image  type  the  associated 
ImageProcessor  (ip)  is  retrieved 
from  im.  The  parameter  string 
(args)  is  not  used  in  this  ex¬ 
ample.  The  remaining  parts 
of  the  plugin  are  identical  to 
Prog.  2.1,  except  that  the 
(slightly  faster)  pixel  access 
methods  get()  and  set()  are 
used.  Also  note  that  the  mod¬ 
ified  image  is  not  re-displayed 
automatically  but  by  an  ex¬ 
plicit  call  to  updateAndDrawO . 


1  import  ij.IJ; 

2  import  i j . ImagePlus ; 

3  import  ij .plugin. Plugin; 

4  import  ij . process . ImageProcessor ; 

5 

6  public  class  My _ Invert er_B  implements  Plugin  { 

7 

8  public  void  run (String  args)  { 

9  ImagePlus  im  =  IJ . getlmage () ; 

10 

11  if  (im.getType ()  !=  ImagePlus . GRAY8)  { 

12  IJ . error ( "8-bit  grayscale  image  required"); 

13  return; 

14  } 

15 

16  ImageProcessor  ip  =  im. getProcessor () ; 

17  int  M  =  ip  .getWidthO  ; 

18  int  N  =  ip . getHeight () ; 

19 

20  //  iterate  over  all  image  coordinates  (u,v) 

21  for  (int  u  =  0;  u  <  M;  u++)  { 

22  for  (int  v  =  0;  v  <  N;  v++)  { 

23  int  p  =  ip. get (u,  v) ; 

24  ip.set(u,  v,  255  -  p) ; 

25  } 

26  } 

27 

28  im. updateAndDrawO  ;  //  redraw  the  modified  image 

29  } 

30  } 


I J .  getlmage  () .  If  no  image  is  currently  open,  get  Image  ()  auto¬ 
matically  displays  an  error  message  and  aborts  the  plugin.  However, 
the  subsequent  test  for  the  correct  image  type  (GRAY8)  and  the  cor¬ 
responding  error  handling  must  be  performed  explicitly.  The  run() 
method  accepts  a  single  string  argument  that  can  be  used  to  pass 
arbitrary  information  for  controlling  the  plugin. 

2.2.7  When  to  use  Plugin  or  PluglnFilter? 

The  choice  of  Plugin  or  PluglnFilter  is  mostly  a  matter  of  taste, 
since  both  versions  have  their  advantages  and  disadvantages.  As  a 
rule  of  thumb,  we  use  the  Plugin  type  for  tasks  that  do  not  require 
any  image  to  be  open  but  for  tasks  that  create,  load,  or  record  im¬ 
ages  or  perform  operations  without  any  images.  Otherwise,  if  one 
or  more  open  images  should  be  processed,  PluglnFilter  is  the  pre¬ 
ferred  choice  and  thus  almost  all  plugins  in  this  book  are  of  type 
PluglnFilter. 

Editing,  compiling,  and  executing  the  plugin 

The  Java  source  file  for  our  plugin  should  be  stored  in  directory 
<ij>/plugins/13  or  an  immediate  subdirectory.  New  plugin  files 
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<ij>  denotes  ImageJ’s  installation  directory. 


can  be  created  with  ImageJ’s  Plugins  >  New...  menu.  ImageJ  even  2.2  ImageJ  Overview 
provides  a  built-in  Java  editor  for  writing  plugins,  which  is  available 
through  the  Plugins  >  Edit...  menu  but  unfortunately  is  of  little  use  for 
serious  programming.  A  better  alternative  is  to  use  a  modern  editor 
or  a  professional  Java  programming  environment,  such  as  Eclipse,14 
NetBeans,15  or  JBuilder,16  all  of  which  are  freely  available. 

For  compiling  plugins  (to  Java  bytecode),  ImageJ  comes  with  its 
own  Java  compiler  as  part  of  its  runtime  environment.  To  compile 
and  execute  the  new  plugin,  simply  use  the  menu 

Plugins  >  Compile  and  Run... 

Compilation  errors  are  displayed  in  a  separate  log  window.  Once  the 
plugin  is  compiled,  the  corresponding  .class  file  is  automatically 
loaded  and  the  plugin  is  applied  to  the  currently  active  image.  An 
error  message  is  displayed  if  no  images  are  open  or  if  the  current 
image  cannot  be  handled  by  that  plugin. 

At  startup,  ImageJ  automatically  loads  all  correctly  named  plu¬ 
gins  found  in  the  <ij>/plugins/  directory  (or  any  immediate  sub¬ 
directory)  and  installs  them  in  its  Plugins  menu.  These  plugins  can 
be  executed  immediately  without  any  recompilation.  References  to 
plugins  can  also  be  placed  manually  with  the 

Plugins  >  Shortcuts  >  Install  Plugin... 

command  at  any  other  position  in  the  ImageJ  menu  tree.  Sequences 
of  plugin  calls  and  other  ImageJ  commands  may  be  recorded  as  macro 
programs  with  Plugins  >  Macros  >  Record. 

Displaying  and  “undoing”  results 

Our  first  plugins  in  Prog.  2. 1-2.2  did  not  create  a  new  image  but 
“destructively”  modified  the  target  image.  This  is  not  always  the 
case,  but  plugins  can  also  create  additional  images  or  compute  only 
statistics,  without  modifying  the  original  image  at  all.  It  may  be  sur¬ 
prising,  though,  that  our  plugin  contains  no  commands  for  displaying 
the  modified  image.  This  is  done  automatically  by  ImageJ  whenever 
it  can  be  assumed  that  the  image  passed  to  a  plugin  was  modified.17 
In  addition,  ImageJ  automatically  makes  a  copy  (“snapshot”)  of  the 
image  before  passing  it  to  the  run()  method  of  a  PluglnFilter-type 
plugin.  This  feature  makes  it  possible  to  restore  the  original  image 
(with  the  Edit  >  Undo  menu)  after  the  plugin  has  finished  without  any 
explicit  precautions  in  the  plugin  code. 

Logging  and  debugging 

The  usual  console  output  from  Java  via  System. out  is  not  available 
in  ImageJ  by  default.  Instead,  a  separate  logging  window  can  be 
used  which  facilitates  simple  text  output  by  the  method 

I J . log(String  s). 

14  www.eclipse.org. 

15  www.netbeans.org. 

16  www.borland.com. 

1  r 7 

No  automatic  redisplay  occurs  if  the  NO_CHANGES  flag  is  set  in  the  return 
value  of  the  plugin’s  setup  ()  method. 


31 


2  ImageJ 


Fig.  2.3 

Information  displayed  in  Im¬ 
age  J’s  main  window  is  ex¬ 
tremely  helpful  for  debugging 
image-processing  operations. 
The  current  cursor  position  is 
displayed  in  pixel  coordinates 
unless  the  associated  image 
is  spatially  calibrated.  The 
way  pixel  values  are  displayed 
depends  on  the  image  type; 
in  the  case  of  a  color  image 
(as  shown  here)  integer  RGB 
component  values  are  shown. 


ImageJ 
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Such  calls  may  be  placed  at  any  position  in  the  plugin  code  for  quick 
and  simple  debugging  at  runtime.  However,  because  of  the  typically 
large  amounts  of  data  involved,  they  should  be  used  with  caution  in 
real  image-processing  operations.  Particularly,  when  placed  in  the 
body  of  inner  processing  loops  that  could  execute  millions  of  times, 
text  output  may  produce  an  enormous  overhead  compared  to  the 
time  used  for  the  actual  calculations. 

ImageJ  itself  does  not  offer  much  support  for  “real”  debugging, 
i.e.,  for  setting  breakpoints,  inspecting  local  variables  etc.  However,  it 
is  possible  to  launch  ImageJ  from  within  a  programming  environment 
(IDE)  such  as  Eclipse  or  Netbeans  and  then  use  all  debugging  options 
that  the  given  environment  provides.18  According  to  experience,  this 
is  only  needed  in  rare  and  exceptionally  difficult  situations.  In  most 
cases,  inspection  of  pixel  values  displayed  in  ImageJ’s  main  window 
(see  Fig.  2.3)  is  much  simpler  and  more  effective.  In  general,  many 
errors  (in  particular  those  related  to  image  coordinates)  can  be  easily 
avoided  by  careful  planning  in  advance. 


2.2.8  Executing  ImageJ  “Commands” 

If  possible,  it  is  wise  in  most  cases  to  re-use  existing  (and  extensively 
tested)  functionality  instead  of  re-implementing  it  oneself.  In  partic- 
uar,  the  Java  library  that  comes  with  ImageJ  covers  many  standard 
image-processing  operations,  many  of  which  are  used  throughout  this 
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18 


For  details  see  the  “HowTo”  section  at  http://imagejdocu.tudor.lu. 


2.2  ImageJ  Overview 


1  import  ij.IJ; 

2  import  i j . ImagePlus ; 

3  import  ij .plugin. Plugin; 

4 

5  public  class  Run_Command_From_PlugIn  implements  Plugin  { 

6 

7  public  void  run (String  args)  { 

8  ImagePlus  im  =  I J . get Image () ; 

9  IJ.run(im,  "Invert",  //  run  the  “Invert”  command  on  im 

io  // ...  continue  with  this  plugin 

n  } 

12  } 


Prog.  2.3 

Executing  the  ImageJ  com¬ 
mand  “Invert”  within  a  Java 
plugin  of  type  Plugin. 


1  public  class  Run_Command_From_PlugInFilter  implements 

PluglnFilter  { 

2  ImagePlus  im ; 

3 

4  public  int  setup (String  args,  ImagePlus  im)  { 

5  this.im  =  im; 

6  return  D0ES_ALL ; 

7  } 

8 

9  public  void  run(ImageProcessor  ip)  { 


10 

im. unlock () ; 

//  unlock  im  to  run  other  commands 

11 

IJ.run(im,  "Invert", 

) ;  //  run  “Invert”  command  on  im 

12 

im. lockO  ; 

//  lock  im  again  (to  be  safe) 

13 

//  ...  continue  with  this  plugin 

14 

} 

15  } 

Prog.  2.4 

Executing  the  ImageJ  com¬ 
mand  “Invert”  within  a  Java 
plugin  of  type  PluglnFilter. 

In  this  case  the  current  image 
is  automatically  locked  during 
plugin  execution,  such  that  no 
other  operation  may  be  applied 
to  it.  However,  the  image  can 
be  temporarily  unlocked  by 
calling  unlock ()  and  lock(),  re¬ 
spectively,  to  run  the  external 
command. 


book.  Additional  classes  and  methods  for  specific  operations  are  con¬ 
tained  in  the  associated  (imagingbook)  library. 

In  the  context  of  ImageJ,  the  term  “command”  refers  to  any  com¬ 
posite  operation  implemented  as  a  (Java)  plugin,  a  macro  command 
or  as  a  script.19  ImageJ  itself  includes  numerous  commands  which 
can  be  listed  with  the  menu  Plugins  >  Utilities  >  Find  Commands.... 
They  are  usually  referenced  “by  name”,  i.e.,  by  a  unique  string.  For 
example,  the  standard  operation  for  inverting  an  image  (Edit  >  Invert) 
is  implemented  by  the  Java  class  ij  .plugin,  filter  .Filters  (with 
the  argument  "invert"). 

An  existing  command  can  also  be  executed  from  within  a  Java 
plugin  with  the  method  IJ.runO,  as  demonstrated  for  the  “Invert” 
command  in  Prog.  2.3.  Some  caution  is  required  with  plugins  of  type 
PluglnFilter,  since  these  lock  the  current  image  during  execution, 
such  that  no  other  operation  can  be  applied  to  it.  The  example  in 
Prog.  2.4  shows  how  this  can  be  resolved  by  a  pair  of  calls  to  unlock  () 
and  lockO,  respectively,  to  temporarily  release  the  current  image. 

A  convenient  tool  for  putting  together  complex  commands  is 
Image J’s  built-in  Macro  Recorder.  Started  with  Plugins  >  Macros  > 


19 


Scripting  languages  for  ImageJ  currently  include  JavaScript ,  BeanShell , 
and  Python. 
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Record...,  it  logs  all  subsequent  commands  in  a  text  file  for  later  use. 
It  can  be  set  up  to  record  commands  in  various  modes,  including 
Jam,  JavaScript ,  BeanShell ,  or  ImageJ  macro  code.  Of  course  it 
does  record  the  application  of  self-defined  plugins  as  well. 


2.3  Additional  Information  on  ImageJ  and  Java 

In  the  following  chapters,  we  mostly  use  concrete  plugins  and  Java 
code  to  describe  algorithms  and  data  structures.  This  not  only  makes 
these  examples  immediately  applicable,  but  they  should  also  help  in 
acquiring  additional  skills  for  using  ImageJ  in  a  step-by-step  fashion. 
To  keep  the  text  compact,  we  often  describe  only  the  run()  method 
of  a  particular  plugin  and  additional  class  and  method  definitions  if 
they  are  relevant  in  the  given  context.  The  complete  source  code 
for  these  examples  can  of  course  be  downloaded  from  the  book’s 
supporting  website.20 

2.3.1  Resources  for  ImageJ 

The  complete  and  most  current  API  reference,  including  source  code, 
tutorials,  and  many  example  plugins,  can  be  found  on  the  official  Im¬ 
ageJ  website.  Another  great  source  for  any  serious  plugin  program¬ 
ming  is  the  tutorial  by  Werner  Bailer  [12]. 

2.3.2  Programming  with  Java 

While  this  book  does  not  require  extensive  Java  skills  from  its  readers, 
some  elementary  knowledge  is  essential  for  understanding  or  extend¬ 
ing  the  given  examples.  There  is  a  huge  and  still-growing  number 
of  introductory  textbooks  on  Java,  such  as  [8,  29,  66,  70,  208]  and 
many  others.  For  readers  with  programming  experience  who  have 
not  worked  with  Java  before,  we  particularly  recommend  some  of 
the  tutorials  on  Oracle’s  Java  website.21  Also,  in  Appendix  F  of  this 
book,  readers  will  find  a  small  compilation  of  specific  Java  topics  that 
cause  frequent  problems  or  programming  errors. 


2.4  Exercises 

Exercise  2.1.  Install  the  current  version  of  ImageJ  on  your  com¬ 
puter  and  make  yourself  familiar  with  the  built-in  commands  (open, 
convert,  edit,  and  save  images). 

Exercise  2.2.  Write  a  new  ImageJ  plugin  that  reflects  a  grayscale 
image  horizontally  (or  vertically)  using  My_Inverter .  java  (Prog. 
2.1)  as  a  template.  Test  your  new  plugin  with  appropriate  images 
of  different  sizes  (odd,  even,  extremely  small)  and  inspect  the  results 
carefully. 

20  www.imagingbook.com. 

21  http://docs.oracle.com/javase/. 


34 


Exercise  2.3.  The  run()  method  of  plugin  Inverter_Plugin_A  (see 
Prog.  2.1)  iterates  over  all  pixels  of  the  given  image.  Find  out  in  which 
order  the  pixels  are  visited:  along  the  (horizontal)  lines  or  along  the 
(vertical)  columns?  Make  a  drawing  to  illustrate  this  process. 

Exercise  2.4.  Create  an  Image J  plugin  for  8-bit  grayscale  images  of 
arbitrary  size  that  paints  a  white  frame  (with  pixel  value  255)  10 
pixels  wide  into  the  image  (without  increasing  its  size).  Make  sure 
this  plugin  also  works  for  very  small  images. 

Exercise  2.5.  Create  a  plugin  for  8-bit  grayscale  images  that  calcu¬ 
lates  and  prints  the  result  (with  IJ.logO).  Use  a  variable  of  type 
int  or  long  for  accumulating  the  pixel  values.  What  is  the  maximum 
image  size  for  which  we  can  be  certain  that  the  result  of  summing 
with  an  int  variable  is  correct? 

Exercise  2.6.  Create  a  plugin  for  8-bit  grayscale  images  that  cal¬ 
culates  and  prints  the  minimum  and  maximum  pixel  values  in  the 
current  image  (with  IJ.logO).  Compare  your  output  to  the  results 
obtained  with  Analyze  >  Measure. 

Exercise  2.7.  Write  a  new  Image  J  plugin  that  shifts  an  8-bit  gray¬ 
scale  image  horizontally  and  circularly  until  the  original  state  is 
reached  again.  To  display  the  modified  image  after  each  shift,  a 
reference  to  the  corresponding  ImagePlus  object  is  required  (Image- 
Processor  has  no  display  methods).  The  ImagePlus  object  is  only 
accessible  to  the  plugin’s  setup  ()  method,  which  is  automatically 
called  before  the  run()  method.  Modify  the  definition  in  Prog.  2.1 
to  keep  a  reference  and  to  redraw  the  ImagePlus  object  as  follows: 

public  class  XY_Plugin  implements  PluglnFilter  { 

ImagePlus  im;  //  new  variable! 

public  int  setup (String  args,  ImagePlus  im)  { 

this .  im  =  im;  //  reference  to  the  associated  ImagePlus  object 
return  D0ES_8G; 

} 

public  void  run(ImageProcessor  ip)  { 

II  ...  modify  ip 

im.updateAndDrawO  ;  //  redraw  the  associated  ImagePlus  object 

//  ... 

} 

1 


.4  Exercises 
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Histograms  and  Image  Statistics 


Histograms  are  used  to  depict  image  statistics  in  an  easily  interpreted 
visual  format.  With  a  histogram,  it  is  easy  to  determine  certain 
types  of  problems  in  an  image,  for  example,  it  is  simple  to  conclude 
if  an  image  is  properly  exposed  by  visual  inspection  of  its  histogram. 
In  fact,  histograms  are  so  useful  that  modern  digital  cameras  often 
provide  a  real-time  histogram  overlay  on  the  viewfinder  (Fig.  3.1)  to 
help  prevent  taking  poorly  exposed  pictures.  It  is  important  to  catch 
errors  like  this  at  the  image  capture  stage  because  poor  exposure 
results  in  a  permanent  loss  of  information,  which  it  is  not  possible  to 
recover  later  using  image-processing  techniques.  In  addition  to  their 
usefulness  during  image  capture,  histograms  are  also  used  later  to 
improve  the  visual  appearance  of  an  image  and  as  a  “forensic”  tool 
for  determining  what  type  of  processing  has  previously  been  applied 
to  an  image.  The  final  part  of  this  chapter  shows  how  to  calculate 
simple  image  statistics  from  the  original  image,  its  histogram,  or  the 
so-called  integral  image. 


Fig.  3.1 

Digital  camera  back  display 
showing  the  associated  RGB 
histograms. 


©  Spring er-Verlag  London  2016 

W.  Burger,  M.J.  Burge,  Digital  Image  Processing ,  Texts  in  Computer  Science, 
DOI  10.1007/978-1-4471-6684-9  3 
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3  Histograms  and 
Image  Statistics 


3.1  What  is  a  Histogram? 


Histograms  in  general  are  frequency  distributions,  and  histograms  of 
images  describe  the  frequency  of  the  intensity  values  that  occur  in 
an  image.  This  concept  can  be  easily  explained  by  considering  an 
old-fashioned  grayscale  image  like  the  one  shown  in  Fig.  3.2. 


Fig.  3.2 

An  8-bit  grayscale  image 
and  a  histogram  depicting 
the  frequency  distribution 
of  its  256  intensity  values. 
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The  histogram  h  for  a  grayscale  image  I  with  intensity  values  in 
the  range  I(u,  v )  E  [0,  K— 1]  holds  exactly  K  entries,  where  K  =  28  = 
256  for  a  typical  8-bit  grayscale  image.  Each  single  histogram  entry 
is  defined  as 

h(z)  =  the  number  of  pixels  in  I  with  the  intensity  value  i, 

for  all  0  <  i  <  K.  More  formally  stated,1 

h(z)  =  card{(R,,e)  |  I(u,v)  =  i  }  .  (3.1) 

Therefore,  h(0)  is  the  number  of  pixels  with  the  value  0,  h(l)  the 
number  of  pixels  with  the  value  1,  and  so  forth.  Finally,  h(255)  is 
the  number  of  all  white  pixels  with  the  maximum  intensity  value 
255  =  K— 1.  The  result  of  the  histogram  computation  is  a  ID  vector 
h  of  length  K.  Figure  3.3  gives  an  example  for  an  image  with  K  —  16 
possible  intensity  values. 


Fig.  3.3 

Histogram  vector  for  an  image 
with  K  —  16  possible  inten¬ 
sity  values.  The  indices  of  the 
vector  element  i  =  0  ...  15 
represent  intensity  values.  The 
value  of  10  at  index  2  means 
that  the  image  contains  10 
pixels  of  intensity  value  2. 
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Since  the  histogram  encodes  no  information  about  where  each  of 
its  individual  entries  originated  in  the  image,  it  contains  no  infor¬ 
mation  about  the  spatial  arrangement  of  pixels  in  the  image.  This 

1  card{. . .}  denotes  the  number  of  elements  (“cardinality”)  in  a  set  (see 
also  Sec.  A.l  in  the  Appendix). 
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3.2  Interpreting 
Histograms 


Fig.  3.4 

Three  very  different  images 
with  identical  histograms. 


is  intentional,  since  the  main  function  of  a  histogram  is  to  provide 
statistical  information,  (e.g.,  the  distribution  of  intensity  values)  in 
a  compact  form.  Is  it  possible  to  reconstruct  an  image  using  only  its 
histogram?  That  is,  can  a  histogram  be  somehow  “inverted”?  Given 
the  loss  of  spatial  information,  in  all  but  the  most  trivial  cases,  the 
answer  is  no.  As  an  example,  consider  the  wide  variety  of  images 
you  could  construct  using  the  same  number  of  pixels  of  a  specific 
value.  These  images  would  appear  different  but  have  exactly  the 
same  histogram  (Fig.  3.4). 


3.2  Interpreting  Histograms 

A  histogram  depicts  problems  that  originate  during  image  acquisi¬ 
tion,  such  as  those  involving  contrast  and  dynamic  range,  as  well  as 
artifacts  resulting  from  image-processing  steps  that  were  applied  to 
the  image.  Histograms  are  often  used  to  determine  if  an  image  is 
making  effective  use  of  its  intensity  range  (Fig.  3.5)  by  examining 
the  size  and  uniformity  of  the  histogram’s  distribution. 


0 


256 


linear 


logarithmic 


Fig.  3.5 

Effective  intensity  range.  The 
graph  depicts  the  frequencies 
of  pixel  values  linearly  (black 
bars)  and  logarithmically  (gray 
bars).  The  logarithmic  form 
makes  even  relatively  low  oc¬ 
currences,  which  can  be  very 
important  in  the  image,  readily 
apparent. 


3.2.1  Image  Acquisition 

Histograms  make  typical  exposure  problems  readily  apparent.  As  an 
example,  a  histogram  where  a  large  section  of  the  intensity  range 
at  one  end  is  largely  unused  while  the  other  end  is  crowded  with 
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3  Histograms  and 
Image  Statistics 

Fig.  3.6 

Exposure  errors  are  read¬ 
ily  apparent  in  histograms. 

Underexposed  (a),  prop¬ 
erly  exposed  (b),  and  over¬ 
exposed  (c)  photographs. 
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high-value  peaks  (Fig.  3.6)  is  representative  of  an  improperly  exposed 
image. 

Contrast 

Contrast  is  understood  as  the  range  of  intensity  values  effectively 
used  within  a  given  image,  that  is  the  difference  between  the  image’s 
maximum  and  minimum  pixel  values.  A  full-contrast  image  makes 
effective  use  of  the  entire  range  of  available  intensity  values  from 
a  =  amin, . . . ,  umax  with  amin  =  0,  amgLX  =  K  —  1  (black  to  white). 
Using  this  definition,  image  contrast  can  be  easily  read  directly  from 
the  histogram.  Figure  3.7  illustrates  how  varying  the  contrast  of  an 
image  affects  its  histogram. 


Dynamic  range 

The  dynamic  range  of  an  image  is,  in  principle,  understood  as  the 
number  of  distinct  pixel  values  in  an  image.  In  the  ideal  case,  the  dy¬ 
namic  range  encompasses  all  K  usable  pixel  values,  in  which  case  the 
value  range  is  completely  utilized.  When  an  image  has  an  available 
range  of  contrast  a  =  alow, . . . ,  ahigh,  with 


a 


min 


and 


^high  ^  ® 


max  i 


then  the  maximum  possible  dynamic  range  is  achieved  when  all  the 
intensity  values  lying  in  this  range  are  utilized  (i.e.,  appear  in  the 
image;  Fig.  3.8). 

While  the  contrast  of  an  image  can  be  increased  by  transforming 
its  existing  values  so  that  they  utilize  more  of  the  underlying  value 
range  available,  the  dynamic  range  of  an  image  can  only  be  increased 
by  introducing  artificial  (that  is,  not  originating  with  the  image  sen¬ 
sor)  values  using  methods  such  as  interpolation  (see  Ch.  22).  An 
image  with  a  high  dynamic  range  is  desirable  because  it  will  suffer 
less  image-quality  degradation  during  image  processing  and  compres¬ 
sion.  Since  it  is  not  possible  to  increase  dynamic  range  after  image 
acquisition  in  a  practical  way,  professional  cameras  and  scanners  work 
at  depths  of  more  than  8  bits,  often  12-14  bits  per  channel,  in  order 
to  provide  high  dynamic  range  at  the  acquisition  stage.  While  most 
output  devices,  such  as  monitors  and  printers,  are  unable  to  actually 
reproduce  more  than  256  different  shades,  a  high  dynamic  range  is 
always  beneficial  for  subsequent  image  processing  or  archiving. 


3.2  Interpreting 
Histograms 


Fig.  3.7 

How  changes  in  contrast  af¬ 
fect  the  histogram:  low  con¬ 
trast  (a),  normal  contrast  (b), 
high  contrast  (c). 


Fig.  3.8 

How  changes  in  dynamic  range 
affect  the  histogram:  high  dy¬ 
namic  range  (a),  low  dynamic 
range  with  64  intensity  val¬ 
ues  (b),  extremely  low  dynamic 
range  with  only  6  intensity 
values  (c). 


3.2.2  Image  Defects 

Histograms  can  be  used  to  detect  a  wide  range  of  image  defects  that 
originate  either  during  image  acquisition  or  as  the  result  of  later  im¬ 
age  processing.  Since  histograms  always  depend  on  the  visual  char¬ 
acteristics  of  the  scene  captured  in  the  image,  no  single  “ideal”  his¬ 
togram  exists.  While  a  given  histogram  may  be  optimal  for  a  specific 
scene,  it  may  be  entirely  unacceptable  for  another.  As  an  exam¬ 
ple,  the  ideal  histogram  for  an  astronomical  image  would  likely  be 
very  different  from  that  of  a  good  landscape  or  portrait  photo.  Nev¬ 
ertheless,  there  are  some  general  rules;  for  example,  when  taking  a 
landscape  image  with  a  digital  camera,  you  can  expect  the  histogram 
to  have  evenly  distributed  intensity  values  and  no  isolated  spikes. 

Saturation 

Ideally  the  contrast  range  of  a  sensor,  such  as  that  used  in  a  camera, 
should  be  greater  than  the  range  of  the  intensity  of  the  light  that  it 
receives  from  a  scene.  In  such  a  case,  the  resulting  histogram  will 
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3  Histograms  and  smo°th  at  both  ends  because  the  light  received  from  the  very 
Image  Statistics  bright  and  the  very  dark  parts  of  the  scene  will  be  less  than  the 

light  received  from  the  other  parts  of  the  scene.  Unfortunately,  this 
ideal  is  often  not  the  case  in  reality,  and  illumination  outside  of  the 
sensor’s  contrast  range,  arising  for  example  from  glossy  highlights 
and  especially  dark  parts  of  the  scene,  cannot  be  captured  and  is  lost. 
The  result  is  a  histogram  that  is  saturated  at  one  or  both  ends  of  its 
range.  The  illumination  values  lying  outside  of  the  sensor’s  range 
are  mapped  to  its  minimum  or  maximum  values  and  appear  on  the 
histogram  as  significant  spikes  at  the  tail  ends.  This  typically  occurs 
in  an  under-  or  overexposed  image  and  is  generally  not  avoidable 
when  the  inherent  contrast  range  of  the  scene  exceeds  the  range  of 
the  system’s  sensor  (Fig.  3.9(a)). 


Fig.  3.9 

Effect  of  image  capture  errors 
on  histograms:  saturation  of 
high  intensities  (a),  histogram 
gaps  caused  by  a  slight  in¬ 
crease  in  contrast  (b),  and 
histogram  spikes  resulting  from 
a  reduction  in  contrast  (c). 


U  2rJj 


(a)  (b)  (c) 


Spikes  and  gaps 

As  discussed  already,  the  intensity  value  distribution  for  an  unpro¬ 
cessed  image  is  generally  smooth;  that  is,  it  is  unlikely  that  isolated 
spikes  (except  for  possible  saturation  effects  at  the  tails)  or  gaps  will 
appear  in  its  histogram.  It  is  also  unlikely  that  the  count  of  any  given 
intensity  value  will  differ  greatly  from  that  of  its  neighbors  (i.e.,  it  is 
locally  smooth).  While  artifacts  like  these  are  observed  very  rarely 
in  original  images,  they  will  often  be  present  after  an  image  has  been 
manipulated,  for  instance,  by  changing  its  contrast.  Increasing  the 
contrast  (see  Ch.  4)  causes  the  histogram  lines  to  separate  from  each 
other  and,  due  to  the  discrete  values,  gaps  are  created  in  the  his¬ 
togram  (Fig.  3.9(b)).  Decreasing  the  contrast  leads,  again  because 
of  the  discrete  values,  to  the  merging  of  values  that  were  previously 
distinct.  This  results  in  increases  in  the  corresponding  histogram  en¬ 
tries  and  ultimately  leads  to  highly  visible  spikes  in  the  histogram 
(Fig.  3.9(c)).2 

Impacts  of  image  compression 

Image  compression  also  changes  an  image  in  ways  that  are  immedi¬ 
ately  evident  in  its  histogram.  As  an  example,  during  GIF  compres¬ 
sion,  an  image’s  dynamic  range  is  reduced  to  only  a  few  intensities 

2  Unfortunately,  these  types  of  errors  are  also  caused  by  the  internal  con¬ 
trast  “optimization”  routines  of  some  image-capture  devices,  especially 
consumer-type  scanners. 
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Fig.  3.10 

Color  quantization  effects  re¬ 
sulting  from  GIF  conversion. 
The  original  image  converted 
to  a  256  color  GIF  image 
(left).  Original  histogram  (a) 
and  the  histogram  after  GIF 
conversion  (b).  When  the  RGB 
image  is  scaled  by  50%,  some 
of  the  lost  colors  are  recreated 
by  interpolation,  but  the  re¬ 
sults  of  the  GIF  conversion 
remain  clearly  visible  in  the 
histogram  (c). 


or  colors,  resulting  in  an  obvious  line  structure  in  the  histogram  that 
cannot  be  removed  by  subsequent  processing  (Fig.  3.10).  Generally, 
a  histogram  can  quickly  reveal  whether  an  image  has  ever  been  sub¬ 
jected  to  color  quantization,  such  as  occurs  during  conversion  to  a 
GIF  image,  even  if  the  image  has  subsequently  been  converted  to  a 
full-color  format  such  as  TIFF  or  JPEG. 

Figure  3.11  illustrates  what  occurs  when  a  simple  line  graphic 
with  only  two  gray  values  (128,  255)  is  subjected  to  a  compression 
method  such  as  JPEG,  that  is  not  designed  for  line  graphics  but  in¬ 
stead  for  natural  photographs.  The  histogram  of  the  resulting  image 
clearly  shows  that  it  now  contains  a  large  number  of  gray  values  that 
were  not  present  in  the  original  image,  resulting  in  a  poor-quality 
image3  that  appears  dirty,  fuzzy,  and  blurred. 


3.3  Calculating  Histograms 

Computing  the  histogram  of  an  8-bit  grayscale  image  containing  in¬ 
tensity  values  between  0  and  255  is  a  simple  task.  All  we  need  is  a 
set  of  256  counters,  one  for  each  possible  intensity  value.  First,  all 
counters  are  initialized  to  zero.  Then  we  iterate  through  the  image  /, 
determining  the  pixel  value  p  at  each  location  (tq  u),  and  increment¬ 
ing  the  corresponding  counter  by  one.  At  the  end,  each  counter  will 
contain  the  number  of  pixels  in  the  image  that  have  the  corresponding 
intensity  value. 

An  image  with  K  possible  intensity  values  requires  exactly  K 
counter  variables;  for  example,  since  an  8-bit  grayscale  image  can 
contain  at  most  256  different  intensity  values,  we  require  256  coun¬ 
ters.  While  individual  counters  make  sense  conceptually,  an  actual 

3  Using  JPEG  compression  on  images  like  this,  for  which  it  was  not  de¬ 
signed,  is  one  of  the  most  egregious  of  imaging  errors.  JPEG  is  designed 
for  photographs  of  natural  scenes  with  smooth  color  transitions,  and  us¬ 
ing  it  to  compress  iconic  images  with  large  areas  of  the  same  color  results 
in  strong  visual  artifacts  (see,  e.g.,  Fig.  1.9  on  p.  17). 
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Fig.  3.11 

Effects  of  JPEG  compres¬ 
sion.  The  original  image  (a) 
contained  only  two  different 
gray  values,  as  its  histogram 
(b)  makes  readily  apparent. 
JPEG  compression,  a  poor 
choice  for  this  type  of  im¬ 
age,  results  in  numerous  addi¬ 
tional  gray  values,  which  are 
visible  in  both  the  resulting 
image  (c)  and  its  histogram 
(d).  In  both  histograms, 
the  linear  frequency  (black 
bars)  and  the  logarithmic  fre¬ 
quency  (gray  bars)  are  shown. 


Prog.  3.1 

IrnageJ  plugin  for  comput¬ 
ing  the  histogram  of  an  8-bit 
grayscale  image.  The  setup () 
method  returns  D0ES_8G  + 
NO_CHANGES,  which  indicates 
that  this  plugin  requires  an 
8-bit  grayscale  image  and 
will  not  alter  it  (line  4). 
In  Java,  all  elements  of  a 
newly  instantiated  numeri¬ 
cal  array  are  automatically 
initialized  to  zero  (line  8). 


1 

o 

public 

class  Compute_Histogram  implements  PluglnFilter  { 

Z 

3 

public  int  setup (String  arg,  ImagePlus  img)  { 

4 

return  D0ES_8G  +  N0_CHANGES ; 

5 

6 

} 

7 

public  void  run(ImageProcessor  ip)  { 

8 

int  []  h  =  new  int  [256]  ;  //  histogram  array 

9 

int  w  =  ip .  getWidthO  ; 

10 

int  h  =  ip . get Height () ; 

11 

12 

for  (int  v  =  0;  v  <  h;  v++)  { 

13 

for  (int  u  =  0;  u  <  w;  u++)  { 

14 

int  i  =  ip . getPixel (u,  v) ; 

15 

h[i]  =  h[i]  +  1; 

16 

} 

IT 

} 

18 

II ...  histogram  h  can  now  be  used 

19 

} 

20 

} 

implementation  would  not  use  K  individual  variables  to  represent  the 
counters  but  instead  would  use  an  array  with  K  entries  (int  [256] 
in  Java).  In  this  example,  the  actual  implementation  as  an  array  is 
straightforward.  Since  the  intensity  values  begin  at  zero  (like  arrays 
in  Java)  and  are  all  positive,  they  can  be  used  directly  as  the  indices 
i  G  [0,  N  —  1]  of  the  histogram  array.  Program  3.1  contains  the  com¬ 
plete  Java  source  code  for  computing  a  histogram  within  the  run() 
method  of  an  IrnageJ  plugin. 

At  the  start  of  Prog.  3.1,  the  array  h  of  type  int  []  is  created  (line 
8)  and  its  elements  are  automatically  initialized4  to  0.  It  makes  no 
difference,  at  least  in  terms  of  the  final  result,  whether  the  array  is 

4  In  Java,  arrays  of  primitives  such  as  int,  double  are  initialized  at  cre¬ 
ation  to  0  in  the  case  of  integer  types  or  0.0  for  floating-point  types, 
while  arrays  of  objects  are  initialized  to  null. 


traversed  in  row  or  column  order,  as  long  as  all  pixels  in  the  image  3  4  Histograms  of 

are  visited  exactly  once.  In  contrast  to  Prog.  2.1,  in  this  example  we  Images  with  More  than 

traverse  the  array  in  the  standard  row-first  order  such  that  the  outer  8  Bits 

for  loop  iterates  over  the  vertical  coordinates  v  and  the  inner  loop 

over  the  horizontal  coordinates  u .5 * *  Once  the  histogram  has  been 

calculated,  it  is  available  for  further  processing  steps  or  for  being 

displayed. 

Of  course,  histogram  computation  is  already  implemented  in  Im- 
ageJ  and  is  available  via  the  method  getHistogramO  for  objects  of 
the  class  ImageProcessor.  If  we  use  this  built-in  method,  the  run() 
method  of  Prog.  3.1  can  be  simplified  to 

public  void  run (ImageProcessor  ip)  { 

int  []  h  =  ip . getHistogram () ;  //  built-in  ImageJ  method 

...  II  histogram  h  can  now  be  used 

} 


3.4  Histograms  of  Images  with  More  than  8  Bits 

Normally  histograms  are  computed  in  order  to  visualize  the  image’s 
distribution  on  the  screen.  This  presents  no  problem  when  dealing 
with  images  having  28  =  256  entries,  but  when  an  image  uses  a  larger 
range  of  values,  for  instance  16-  and  32-bit  or  floating-point  images 
(see  Table  1.1),  then  the  growing  number  of  necessary  histogram  en¬ 
tries  makes  this  no  longer  practical. 


3.4.1  Binning 


Since  it  is  not  possible  to  represent  each  intensity  value  with  its  own 
entry  in  the  histogram,  we  will  instead  let  a  given  entry  in  the  his¬ 
togram  represent  a  range  of  intensity  values.  This  technique  is  often 
referred  to  as  “binning”  since  you  can  visualize  it  as  collecting  a  range 
of  pixel  values  in  a  container  such  as  a  bin  or  bucket.  In  a  binned 
histogram  of  size  F>,  each  bin  h  (j)  contains  the  number  of  image 
elements  having  values  within  the  interval 
(analogous  to  Eqn.  (3.1)) 


aj>aj+i)i  an(^  therefore 


h  (j)  =  card  {(iq  v)  \  aj  <  /(iq  v)  <  aJ+1 }  , 


(3.2) 


for  0  <  j  <  B.  Typically  the  range  of  possible  values  in  B  is  divided 
into  bins  of  equal  size  kB  =  K/ B  such  that  the  starting  value  of  the 
interval  j  is 

_  .  K  _  .  , 
ai  —  J  ’  —  3  '  - 


B 


3.4.2  Example 

In  order  to  create  a  typical  histogram  containing  B  =  256  entries 
from  a  14-bit  image,  one  would  divide  the  original  value  range  j  = 

5  In  this  way,  image  elements  are  traversed  in  exactly  the  same  way  that 

they  are  laid  out  in  computer  memory,  resulting  in  more  efficient  mem¬ 

ory  access  and  with  it  the  possibility  of  increased  performance,  especially 

when  dealing  with  larger  images  (see  also  Appendix  F). 
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0, . . . ,  214  — 1  into  256  equal  intervals,  each  of  length  kB  =  214/ 256  = 
64,  such  that  a0  =  0,  a1  =  64,  a2  =  128,  . . . ,  a2 55  =  16320  and 
a256  —  aB  —  214  =  16320  =  K.  This  gives  the  following  association 
between  pixel  values  and  histogram  bins  h(0), . . . ,  h (255) : 

0,  . . . ,  63  -A-  h(0), 

64,  . ..,  127  -A  h (1) , 

128,  ...,  191  ^  h(2), 

16320,  . ..,  16383  -A  h(255). 


3.4.3  Implementation 


If,  as  in  the  previous  example,  the  value  range  0, . . . ,  K—  1  is  divided 
into  equal  length  intervals  kB  =  K/B ,  there  is  naturally  no  need  to 
use  a  mapping  table  to  find  clj  since  for  a  given  pixel  value  a  =  /(r,  v) 
the  correct  histogram  element  j  is  easily  computed.  In  this  case,  it  is 
enough  to  simply  divide  the  pixel  value  /(r,  v)  by  the  interval  length 
kB;  that  is, 

I(u,v)  I(u,v)  I(u,v)  •  B 


k 


B 


K/B 


K 


(3.3) 


As  an  index  to  the  appropriate  histogram  bin  h (j),  we  require  an 
integer  value 

I(u,  v )  •  B 


J  = 


K 


(3.4) 


where  |_*J  denotes  the  floor  operator.6  A  Java  method  for  computing 
histograms  by  “linear  binning”  is  given  in  Prog.  3.2.  Note  that  all  the 
computations  from  Eqn.  (3.4)  are  done  with  integer  numbers  without 
using  any  floating-point  operations.  Also  there  is  no  need  to  explicitly 
call  the  floor  function  because  the  expression 


a  *  B  /  K 


in  line  11  uses  integer  division  and  in  Java  the  fractional  result  of 
such  an  operation  is  truncated,  which  is  equivalent  to  applying  the 
floor  function  (assuming  positive  arguments).7  The  binning  method 
can  also  be  applied,  in  a  similar  way,  to  floating-point  images. 


3.5  Histograms  of  Color  Images 

When  referring  to  histograms  of  color  images,  typically  what  is  meant 
is  a  histogram  of  the  image  intensity  (luminance)  or  of  the  individual 
color  channels.  Both  of  these  variants  are  supported  by  practically 
every  image-processing  application  and  are  used  to  objectively  ap¬ 
praise  the  image  quality,  especially  directly  after  image  acquisition. 

6  \x\  rounds  x  down  to  the  next  whole  number  (see  Appendix  A). 

7  For  a  more  detailed  discussion,  see  the  section  on  integer  division  in 
Java  in  Appendix  F  (p.  765). 
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1 

int  []  binnedHistogram (ImageProcessor  ip)  { 

3.5  Histograms  of 

2 

int  K  =  256;  //  number  of  intensity  values 

Color  Images 

3 

int  B  =  32;  //  size  of  histogram,  must  be  defined 

4 

int  []  H  =  new  int  [B]  ;  //  histogram  array 

Prog.  3.2 

5 

int  w  =  ip  .getWidthO  ; 

Histogram  computation  us- 

6 

int  h  =  ip . getHeight () ; 

ing  “binning”  (Java  method). 
Example  of  computing  a  histo- 

7 

gram  with  B  —  32  bins  for 

8 

for  (int  v  =  0;  v  <  h;  v++)  { 

an  8-bit  grayscale  image  with 

9 

for  (int  u  =  0;  u  <  w;  u++)  { 

K  =  256  intensity  levels.  The 
method  binnedHistogram () 

10 

int  a  =  ip . getPixel (u,  v) ; 

returns  the  histogram  of  the 

11 

int  i  =  a  *  b  /  K;  //  integer  operations  only! 

image  object  ip  passed  to  it  as 
an  int  array  of  size  B. 

12 

H  [i]  =  H  [i]  +  1; 

13 

} 

14 

} 

15 

//  return  binned  histogram 

16 

return  H; 

17 

} 

3.5.1  Intensity  Histograms 

The  intensity  or  luminance  histogram  hLum  of  a  color  image  is  nothing 
more  than  the  histogram  of  the  corresponding  grayscale  image,  so 
naturally  all  aspects  of  the  preceding  discussion  also  apply  to  this 
type  of  histogram.  The  grayscale  image  is  obtained  by  computing 
the  luminance  of  the  individual  channels  of  the  color  image.  When 
computing  the  luminance,  it  is  not  sufficient  to  simply  average  the 
values  of  each  color  channel;  instead,  a  weighted  sum  that  takes  into 
account  color  perception  theory  should  be  computed.  This  process 
is  explained  in  detail  in  Chapter  12  (p.  304). 

3.5.2  Individual  Color  Channel  Histograms 

Even  though  the  luminance  histogram  takes  into  account  all  color 
channels,  image  errors  appearing  in  single  channels  can  remain  undis¬ 
covered.  For  example,  the  luminance  histogram  may  appear  clean 
even  when  one  of  the  color  channels  is  over  saturated.  In  RGB  im¬ 
ages,  the  blue  channel  contributes  only  a  small  amount  to  the  total 
brightness  and  so  is  especially  sensitive  to  this  problem. 

Component  histograms  supply  additional  information  about  the 
intensity  distribution  within  the  individual  color  channels.  When 
computing  component  histograms,  each  color  channel  is  considered 
a  separate  intensity  image  and  each  histogram  is  computed  inde¬ 
pendently  of  the  other  channels.  Figure  3.12  shows  the  luminance 
histogram  hLum  and  the  three  component  histograms  hR,  hG,  and  hB 
of  a  typical  RGB  color  image.  Notice  that  saturation  problems  in 
all  three  channels  (red  in  the  upper  intensity  region,  green  and  blue 
in  the  lower  regions)  are  obvious  in  the  component  histograms  but 
not  in  the  luminance  histogram.  In  this  case  it  is  striking,  and  not 
at  all  atypical,  that  the  three  component  histograms  appear  com¬ 
pletely  different  from  the  corresponding  luminance  histogram  hLum 
(Fig.  3.12(b)). 
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Fig.  3.12 

Histograms  of  an  RGB  color 
image:  original  image  (a),  lu¬ 
minance  histogram  hLum  (b), 
RGB  color  components  as  in¬ 
tensity  images  (c— e),  and  the 
associated  component  his¬ 
tograms  hR,  hG,  hB  (f— h). 
The  fact  that  all  three  color 
channels  have  saturation  prob¬ 
lems  is  only  apparent  in  the 
individual  component  his¬ 
tograms.  The  spike  in  the 
distribution  resulting  from 
this  is  found  in  the  middle  of 
the  luminance  histogram  (b). 
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3.5.3  Combined  Color  Histograms 

Luminance  histograms  and  component  histograms  both  provide  use¬ 
ful  information  about  the  lighting,  contrast,  dynamic  range,  and  sat¬ 
uration  effects  relative  to  the  individual  color  components.  It  is  im¬ 
portant  to  remember  that  they  provide  no  information  about  the 
distribution  of  the  actual  colors  in  the  image  because  they  are  based 
on  the  individual  color  channels  and  not  the  combination  of  the  indi¬ 
vidual  channels  that  forms  the  color  of  an  individual  pixel.  Consider, 
for  example,  when  hR,  the  component  histogram  for  the  red  channel, 
contains  the  entry 

hR(200)  =  24. 

Then  it  is  only  known  that  the  image  has  24  pixels  that  have  a  red 
intensity  value  of  200.  The  entry  does  not  tell  us  anything  about  the 
green  and  blue  values  of  those  pixels,  which  could  be  any  valid  value 
(*),  that  is, 

(r,g,b)  =  (200,  *,  *). 

Suppose  further  that  the  three  component  histograms  included  the 
following  entries: 

hR(50)  =  100,  hG(50)  =  100,  hB(50)  =  100. 

Could  we  conclude  from  this  that  the  image  contains  100  pixels  with 
the  color  combination 


(r,g,b)  =  (50,50,50) 

or  that  this  color  occurs  at  all?  In  general,  no,  because  there  is  no 
way  of  ascertaining  from  these  data  if  there  exists  a  pixel  in  the  image 
in  which  all  three  components  have  the  value  50.  The  only  thing  we 
could  really  say  is  that  the  color  value  (50,  50,  50)  can  occur  at  most 
100  times  in  this  image. 


So,  although  conventional  (intensity  or  component)  histograms  of  3  7  Statistical 
color  images  depict  important  properties,  they  do  not  really  provide  Information  from  the 
any  useful  information  about  the  composition  of  the  actual  colors  in  Histogram 
an  image.  In  fact,  a  collection  of  color  images  can  have  very  similar 
component  histograms  and  still  contain  entirely  different  colors.  This 
leads  to  the  interesting  topic  of  the  combined  histogram,  which  uses 
statistical  information  about  the  combined  color  components  in  an 
attempt  to  determine  if  two  images  are  roughly  similar  in  their  color 
composition.  Features  computed  from  this  type  of  histogram  often 
form  the  foundation  of  color-based  image  retrieval  methods.  We  will 
return  to  this  topic  in  Chapter  12,  where  we  will  explore  color  images 
in  greater  detail. 


3.6  The  Cumulative  Histogram 

The  cumulative  histogram,  which  is  derived  from  the  ordinary  his¬ 
togram,  is  useful  when  performing  certain  image  operations  involving 
histograms;  for  instance,  histogram  equalization  (see  Sec.  4.5).  The 
cumulative  histogram  H  is  defined  as 

i 

H(i)  =  h(j)  for  0  <  i  <  K.  (3.5) 

3= 0 

A  particular  value  H(z)  is  thus  the  sum  of  all  histogram  values  h (j), 
with  j  <  i.  Alternatively,  we  can  define  H  recursively  (as  imple¬ 
mented  in  Prog.  4.2  on  p.  66): 


h(0)  for  i  —  0, 

H(z  — 1)  +  h(z)  for  0  <  i  <  K. 


The  cumulative  histogram  H(i)  is  a  monotonically  increasing  function 
with  the  maximum  value 


k-  1 

H(K-l)=J2^j)  =  M-N,  (3.7) 

3=  0 

that  is,  the  total  number  of  pixels  in  an  image  of  width  M  and  height 
N.  Figure  3.13  shows  a  concrete  example  of  a  cumulative  histogram. 

The  cumulative  histogram  is  useful  not  primarily  for  viewing  but 
as  a  simple  and  powerful  tool  for  capturing  statistical  information 
from  an  image.  In  particular,  we  will  use  it  in  the  next  chapter  to 
compute  the  parameters  for  several  common  point  operations  (see 
Sec.  4. 4-4. 6). 

3.7  Statistical  Information  from  the  Histogram 

Some  common  statistical  parameters  of  the  image  can  be  conveniently 
calculated  directly  from  its  histogram.  For  example,  the  minimum 
and  maximum  pixel  value  of  an  image  I  can  be  obtained  by  simply 


49 


3  Histograms  and 
Image  Statistics 


Fig.  3.13 

The  ordinary  histogram 
h(i)  and  its  associated  cu¬ 
mulative  histogram  H  (i). 


finding  the  smallest  and  largest  histogram  index  with  nonzero  value, 
i.e., 


min  (7)  =  min  {i  |  h(z)  >  0}, 
max(J)  =  max  { i  |  h(z)  >  0}. 

If  we  assume  that  the  histogram  is  already  available,  the  advantage 
is  that  the  calculation  does  not  include  the  entire  image  but  only  the 
relatively  small  set  of  histogram  elements  (typ.  256). 

3.7.1  Mean  and  Variance 

The  mean  value  /a  of  an  image  7  (of  size  M  x  N )  can  be  calculated 
as 


1 


M  —  l  N-l 


MN 


u= 0  v=0 


1 

~MN 


K- 1 


•^2  h(d  A 

i= 0 


i.e.,  either  directly  from  the  pixel  values  7(r,  v)  or  indirectly  from  the 
histogram  h  (of  size  77),  where  MN  =  JT  h(z)  is  the  total  number  of 
pixels. 

Analogously  we  can  also  calculate  the  variance  of  the  pixel  values 
straight  from  the  histogram  as 


1 


M- 1  N-l 


a 


MN 


T  T  [Hu,v)-ij] 


i 


K  —  l 


u= 0  v=0 


MN 


(3.10) 


2  =  0 


As  we  see  in  the  right  parts  of  Eqns.  (3.9)  and  (3.10),  there  is  no  need 
to  access  the  original  pixel  values. 

The  formulation  of  the  variance  in  Eqn.  (3.10)  assumes  that  the 
arithmetic  mean  /a  has  already  been  determined.  This  is  not  nec¬ 
essary  though,  since  the  mean  and  the  variance  can  be  calculated 
together  in  a  single  iteration  over  the  image  pixels  or  the  associated 
histogram  in  the  form 


1 


MN 

1 


•  A 

■  (b 


and 


1 


MN 


•  A* 


(3.11) 

(3.12) 
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MN 


with  the  quantities 


3.8  Block  Statistics 


m-in-i  K- 1 

A=^2^I{u,v)  =  ^W-h(i),  (3.13) 

n=0  v=0  i=0 

M-1N-1  K- 1 

B  =  X  X  /2C’u)  =  X  *2-h(T  (3-14) 

u— 0  v=0  i— 0 

The  above  formulation  has  the  additional  numerical  advantage  that 
all  summations  can  be  performed  with  integer  values,  in  contrast  to 
Eqn.  (3.10)  which  requires  the  summation  of  floating-point  values. 


3.7.2  Median 


The  median  m  of  an  image  is  defined  as  the  smallest  pixel  value 
that  is  greater  or  equal  to  one  half  of  all  pixel  values,  i.e.,  lies  “in  the 
middle”  of  the  pixel  values.8  The  median  can  also  be  easily  calculated 
from  the  image’s  histogram. 

To  determine  the  median  of  an  image  I  from  the  associated  his¬ 
togram  it  is  sufficient  to  find  the  index  i  that  separates  the  histogram 
into  two  halves,  such  that  the  sum  of  the  histogram  entries  to  the  left 
and  the  right  of  i  are  approximately  equal.  In  other  words,  i  is  the 
smallest  index  where  the  sum  of  the  histogram  entries  below  (and 
including)  i  corresponds  to  at  least  half  of  the  image  size,  that  is, 


m  =  min 


i  |  XT4')  - 

3=0 


(3.15) 


Since  Y!j=o  h(i)  =  H  (i)  (see  Eqn.  (3.5)),  the  median  calculation  can 
be  formulated  even  simpler  as 


m  =  min 


> 


(3.16) 


given  the  cumulative  histogram  H. 


3.8  Block  Statistics 

3.8.1  Integral  Images 

Integral  images  (also  known  as  summed  area  tables  [58])  provide  a 
simple  way  for  quickly  calculating  elementary  statistics  of  arbitrary 
rectangular  sub-images.  They  have  found  use  in  several  interest¬ 
ing  applications,  such  as  fast  filtering,  adaptive  thresholding,  image 
matching,  local  feature  extraction,  face  detection,  and  stereo  recon¬ 
struction  [20,142,244]. 

Given  a  scalar- valued  (grayscale)  image  I :  M  x  TV  R  the  asso¬ 
ciated  first-order  integral  image  is  defined  as 

U  V 

£i(u,v)  =  EXT*’4')-  (3-17) 

2  =  0  j  —  0 


8 


See  Sec.  5.4.2  for  an  alternative  definition  of  the  median. 
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Li(u,  v) 


Fig.  3.14 

Block-based  calculations  with 
integral  images.  Only  four 
samples  from  the  integral  im¬ 
age  L1  are  required  to  cal¬ 
culate  the  sum  of  the  pix¬ 
els  inside  the  (green)  rect¬ 
angle  R  =  (a,  b),  defined 
by  the  corner  coordinates 
a  =  (Ua,Va)  and  b  =  (ub,vb). 


U  a  Ub 


A 

B 

/ 

c  / 

/ 

/ 

/ 
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R 
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a 

\ 
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Thus  a  value  in  L1  is  the  sum  of  all  pixel  values  in  the  original  image 
I  located  to  the  left  and  above  the  given  position  (r,u),  inclusively. 
The  integral  image  can  be  calculated  efficiently  with  a  single  pass 
over  the  image  I  by  using  the  recurrence  relation 


Ii(r,u) 


< 


0 

L1(u—  1,  v)  +  L1(u,  v  —  1)  — 
L1(u—  1,  v  —  1)  +  I(u,  v) 


for  u  <  0  or  v  <  0, 
for  u,  v  >  0,  (3.18) 


for  positions  u  =  0, . . . ,  M  —  1  and  v  =  0, . . . ,  TV  —  1  (see  Alg.  3.1). 

Suppose  now  that  we  wanted  to  calculate  the  sum  of  the  pixel 
values  in  a  given  rectangular  region  R ,  defined  by  the  corner  positions 
a  =  (ua,va),  b  =  (ub,vb),  that  is,  the  first-order  block  sum 


ub  vb 

Si(R)  =  £  £  (3.19) 

i  =  ua  J=Va 

from  the  integral  image  L1.  As  shown  in  Fig.  3.14,  the  quantity 
L1(ua  —  1,  va  —  1)  corresponds  to  the  pixel  sum  within  rectangle  A, 
and  L1(ub,vb)  is  the  pixel  sum  over  all  four  rectangles  A,  B ,  C  and 
R ,  that  is, 


£l(«a-M0-l)  =  S^A), 

ZiUbjVa  —  l)  =  Si(A)  +  Si(B), 

iiK-i,Ub)  =  s^  +  s^c), 

Zi(ub,vb)  =  Sl(A)  +  Sl(B)  +  S1(C)  +  S^R). 


(3.20) 


Thus  Si  ( R )  can  be  calculated  as 


S1(R)  =  S1(A)+S1(B)+S1(C)+S1(R)  +  S1  (A) 


v* 

Li(ub,vb) 


Li(ua-l,va-l) 

-[51(A)+5i(B)]-[5i(A)+5i(C')] 


(3.21) 


Zi(ub,va-1) 


v 

Li(ua-l,vh) 


^■1  (^6?  ^b)  "F  ^-1  i^a  1)  1)  ^"l(^a 
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that  is,  by  taking  only  four  samples  from  the  integral  image  L1. 


3.8.2  Mean  Intensity 


3.8  Block  Statistics 


Given  the  region  size  NR  and  the  sum  of  the  pixel  values  S1(R),  the 
average  intensity  value  (mean)  inside  the  rectangle  R  can  now  easily 
be  found  as 


with  Si(R)  as  defined  in  Eqn.  (3.21)  and  the  region  size 

Nr  =  \R\  =  (ub  —  ua  +  1)  •  (vb  —  va  +  1).  (3.23) 


3.8.3  Variance 

Calculating  the  variance  inside  a  rectangular  region  R  requires  the 
summation  of  squared  intensity  values,  that  is,  tabulating 


U  V 


(3.24) 


i— 0  j— 0 

which  can  be  performed  analogously  to  Eqn.  (3.18)  in  the  form 


for  w  <  0  or  r  <  0 


0 


L2(u,v)  =  <  L2(u—1,  v)  +  L2(u,v-1)  - 

f  L2(u—  1,  v—  1)  +  I2(u,  v)  for  w,  v  >  0.  (3.25) 


As  in  Eqns.  (3. 19)— (3.21) ,  the  sum  of  the  squared  values  inside  a  given 
rectangle  R  (i.e.,  the  second-order  block  sum)  can  be  obtained  as 


(3.26) 


^-2  (^b")  )  A  ^~2  (d^a  1)  ^~2  (^a 

From  this,  the  variance  inside  the  rectangular  region  R  is  finally 
calculated  as 


[S2(R)  -  N  ■  (Si(R))2] ,  (3.27) 

R  R 


with  Nr  as  defined  in  Eqn.  (3.23).  In  addition,  certain  higher-order 


statistics  can  be  efficiently  calculated  with  summation  tables  in  a 
similar  fashion. 

3.8.4  Practical  Calculation  of  Integral  Images 

Algorithm  3.1  shows  how  L1  and  L2  can  be  calculated  in  a  single 
iteration  over  the  original  image  I.  Note  that  the  accumulated  values 
in  the  integral  images  L1,  L2  tend  to  become  quite  large.  Even  with 
pictures  of  medium  size  and  8-bit  intensity  values,  the  range  of  32-bit 
integers  is  quickly  exhausted  (particularly  when  calculating  L2).  The 
use  of  64-bit  integers  (type  long  in  Java)  or  larger  is  recommended  to 
avoid  arithmetic  overflow.  A  basic  implementation  of  integral  images 
is  available  as  part  of  the  imagingbook  library.9 

Class  imagingbook . lib . image . Integrallmage. 


9 
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1 

lntegrallmage(/) 

Image  Statistics 

Input:  /,  a  scalar- valued  input  image  with  I(u,v )  £  R. 

Returns  the  first  and  second  order  integral  images  of  /. 

Alg.  3.1 

Joint  calculation  of  the  in- 

2 

(M,  N )  <-  Size(7) 

tegral  images  Y.1  and  L2 

3 

Create  maps  L1,L2:  M  x  N  i— »•  R 

for  a  scalar- valued  image  I. 

4 

Process  the  first  image  line  (y  —  0): 

Li  (0,  0)  <—  7(0,  0) 

5 

z2(o, 0)  <—  /2(0, 0) 

6 

for  u  i —  1, . . . ,  M—  1  do 

7 

L1(u,  0)  <—  L1(u  —  1,  0)  +  I(u,  0) 

8 

L2  (u,  0)  i —  L2  ( u  — 1,0)  +  /2  (u,  0) 

Process  the  remaining  image  lines  (y  >  0): 

9 

for  v  <—  1, . . . ,  N—  1  do 

10 

L1  (0,  v)  L1  (0,  v  1)  +  7(0,  v) 

11 

Z2(0,  v)  L2(0,  v  —  1)  +  72(0,  v) 

12 

for  u  i —  1, . . . ,  M—  1  do 

13 

Lx  (u,  v )  Lx(u  —  l,v)  +  Lx  (u,  v  —  1)  — 

Lx(u—  1,  n  — 1)  +  I(u,v) 

14 

L2(u,  v)  e-  L2(u-l,v)  +  L2(u,v-l)  - 

I2(r— 1,  v  —  1)  +  72(r,  v ) 

15 

:  return  (Z1,Z2) 

3.9  Exercises 

Exercise  3.1.  In  Prog.  3.2,  B  and  K  are  constants.  Consider  if  there 
would  be  an  advantage  to  computing  the  value  of  B/K  outside  of  the 
loop,  and  explain  your  reasoning. 

Exercise  3.2.  Develop  an  Image J  plugin  that  computes  the  cumu¬ 
lative  histogram  of  an  8-bit  grayscale  image  and  displays  it  as  a  new 
image,  similar  to  H(i)  in  Fig.  3.13.  Hint:  Use  the  ImageProcessor 
method  int  []  getHistogramO  to  retrieve  the  original  image’s  his¬ 
togram  values  and  then  compute  the  cumulative  histogram  “in  place” 
according  to  Eqn.  (3.6).  Create  a  new  (blank)  image  of  appropriate 
size  (e.g.,  256  x  150)  and  draw  the  scaled  histogram  data  as  black 
vertical  bars  such  that  the  maximum  entry  spans  the  full  height  of 
the  image.  Program  3.3  shows  how  this  plugin  could  be  set  up  and 
how  a  new  image  is  created  and  displayed. 

Exercise  3.3.  Develop  a  technique  for  nonlinear  binning  that  uses  a 
table  of  interval  limits  aj  (Eqn.  (3.2)). 

Exercise  3.4.  Develop  an  Image  J  plugin  that  uses  the  Java  meth¬ 
ods  Math .  random  ( )  or  Random,  next  Int  (int  n)  to  create  an  image 
with  random  pixel  values  that  are  uniformly  distributed  in  the  range 
[0,255].  Analyze  the  image’s  histogram  to  determine  how  equally 
distributed  the  pixel  values  truly  are. 

Exercise  3.5.  Develop  an  Image  J  plugin  that  creates  a  random  im¬ 
age  with  a  Gaussian  (normal)  distribution  with  mean  value  fi  =  128 
and  standard  deviation  a  =  50.  Use  the  standard  Java  method 
double  Random,  next  Gaussian  ()  to  produce  normally-distributed 
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random  numbers  (with  fi  =  0  and  a  =  1)  and  scale  them  appro-  3  9  Exercises 
priately  to  pixel  values.  Analyze  the  resulting  image  histogram  to 
see  if  it  shows  a  Gaussian  distribution  too. 


Exercise  3.6.  Implement  the  calculation  of  the  arithmetic  mean  i± 
and  the  variance  a2  of  a  given  grayscale  image  from  its  histogram  h 
(see  Sec.  3.7.1).  Compare  your  results  to  those  returned  by  Image J’s 
Analyze  >  Measure  tool  (they  should  match  exactly). 

Exercise  3.7.  Implement  the  first-order  integral  image  (Lx)  calcu¬ 
lation  described  in  Eqn.  (3.18)  and  calculate  the  sum  of  pixel  values 
Si(R)  inside  a  given  rectangle  R  using  Eqn.  (3.21).  Verify  numeri¬ 
cally  that  the  results  are  the  same  as  with  the  naive  formulation  in 
Eqn.  (3.19). 

Exercise  3.8.  Values  of  integral  images  tend  to  become  quite  large. 
Assume  that  32-bit  signed  integers  (int)  are  used  to  calculate  the 
integral  of  the  squared  pixel  values,  that  is,  L2  (see  Eqn.  (3.24)),  for 
an  8-bit  grayscale  image.  What  is  the  maximum  image  size  that  is 
guaranteed  not  to  cause  an  arithmetic  overflow?  Perform  the  same 
analysis  for  64-bit  signed  integers  (long). 

Exercise  3.9.  Calculate  the  integral  image  L1  for  a  given  image  /, 
convert  it  to  a  floating-point  iamge  (FloatProcessor)  and  display 
the  result.  You  will  realize  that  integral  images  are  without  any 
apparent  structure  and  they  all  look  more  or  less  the  same.  Come 
up  with  an  efficient  method  for  reconstructing  the  original  image  I 
from  L1. 
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Prog.  3.3 

Creating  and  displaying  a  new 
image  ( Image J  plugin).  First, 
we  create  a  ByteProcessor  ob¬ 
ject  (histlp,  line  20)  that  is 
subsequently  filled.  At  this 
point,  histlp  has  no  screen 
representation  and  is  thus  not 
visible.  Then,  an  associated 
ImagePlus  object  is  created 
(line  33)  and  displayed  by 
applying  the  show()  method 
(line  34).  Notice  how  the  ti¬ 
tle  (String)  is  retrieved  from 
the  original  image  inside  the 
setup  ()  method  (line  10)  and 
used  to  compose  the  new  im¬ 
age’s  title  (lines  30  and  33).  If 
histlp  is  changed  after  call¬ 
ing  showO,  then  the  method 
updateAndDrawO  could  be 
used  to  redisplay  the  associ¬ 
ated  image  again  (line  34). 


1  import  i j . ImagePlus ; 

2  import  ij . plugin . filter . PluglnFilter ; 

3  import  ij . process . ByteProcessor ; 

4  import  ij . process . ImageProcessor ; 

5 

6  public  class  Create_New_Image  implements  PluglnFilter  { 

7  ImagePlus  im ; 

8 

9  public  int  setup (String  arg,  ImagePlus  im)  { 

10  this.im  =  im; 

11  return  D0ES_8G  +  N0_CHANGES ; 

12  } 

13 

14  public  void  run (ImageProcessor  ip)  { 

15  //  obtain  the  histogram  of  ip: 

16  int  []  hist  =  ip . getHistogram () ; 

17  int  K  =  hist. length; 

18 

19  //  create  the  histogram  image: 

20  ImageProcessor  hip  =  new  ByteProcessor (K,  100); 

21  hip .  setValue  (255)  ;  // white  =  255 

22  hip . fill () ; 

23 

24  //  draw  the  histogram  values  as  black  bars  in  hip  here, 

25  // for  example,  using  hip. putpixel(u,  v,  0) 

26  //  ... 

27 

28  //  compose  a  nice  title: 

29  String  imTitle  =  im.getShortTitle () ; 

30  String  histTitle  =  "Histogram  of  "  +  imTitle; 

31 

32  //  display  the  histogram  image: 

33  ImagePlus  him  =  new  ImagePlus (title ,  hip); 

34  him.  showO; 

35  } 

36  } 
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4 


Point  Operations 


Point  operations  perform  a  modification  of  the  pixel  values  without 
changing  the  size,  geometry,  or  local  structure  of  the  image.  Each 
new  pixel  value  b  =  I'(u,  v )  depends  exclusively  on  the  previous  value 
a  =  I(u,v)  at  the  same  position  and  is  thus  independent  from  any 
other  pixel  value,  in  particular  from  any  of  its  neighboring  pixels.1 
The  original  pixel  values  a  are  mapped  to  the  new  values  b  by  some 
given  function  /,  i.e., 

b=f(l(u,v))  or  b  =  f(a).  (4.1) 

If,  as  in  this  case,  the  function  /()  is  independent  of  the  image  coor¬ 
dinates  (i.e.,  the  same  throughout  the  image),  the  operation  is  called 
“global”  or  “homogeneous”.  Typical  examples  of  homogeneous  point 
operations  include,  among  others: 

•  modifying  image  brightness  or  contrast, 

•  applying  arbitrary  intensity  transformations  (“curves”), 

•  inverting  images, 

•  quantizing  (or  “posterizing”)  images, 

•  global  thresholding, 

•  gamma  correction, 

•  color  transformations 

•  etc. 

We  will  look  at  some  of  these  techniques  in  more  detail  in  the  follow¬ 
ing. 

In  contrast  to  Eqn.  (4.1),  the  mapping  function  g()  for  a  nonho- 
mogeneous  point  operation  would  also  take  into  account  the  current 
image  coordinate  (iqi?),  that  is, 

b  =  g{l(u,v),u,v)  or  b  =  f(a,u,v).  (4-2) 

A  typical  nonhomogeneous  operation  is  the  local  adjustment  of  con¬ 
trast  or  brightness  used,  for  example,  to  compensate  for  uneven  light¬ 
ing  during  image  acquisition. 

1  If  the  result  depends  on  more  than  one  pixel  value,  the  operation  is 
called  a  “filter”,  as  described  in  Chapter  5. 
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4  point  Operations  4.1  Modifying  Image  Intensity 

4.1.1  Contrast  and  Brightness 


Let  us  start  with  a  simple  example.  Increasing  the  image’s  contrast 
by  50%  (i.e.,  by  the  factor  1.5)  or  raising  the  brightness  by  10  units 
can  be  expressed  by  the  mapping  functions 

«/contr(^0  ^  /bright  (^0  CL  ~\~  10  ,  (4.3) 

respectively.  The  first  operation  is  implemented  as  an  ImageJ  plugin 
by  the  code  shown  in  Prog.  4.1,  which  can  easily  be  adapted  to  per¬ 
form  any  other  type  of  point  operation.  Rounding  to  the  nearest  inte¬ 
ger  values  is  accomplished  by  simply  adding  0.5  before  the  truncation 
effected  by  the  (int)  typecast  in  line  8  (this  only  works  for  positive 
values).  Also  note  the  use  of  the  more  efficient  image  processor  meth¬ 
ods  get()  and  set()  (instead  of  getPixelO  and  putPixelO)  in 
this  example. 


Prog.  4.1 

Point  operation  to  increase 
the  contrast  by  50%  (ImageJ 
plugin).  Note  that  in  line  8 
the  result  of  the  multiplication 
of  the  integer  pixel  value  by 
the  constant  1.5  (implicitly  of 
type  double)  is  of  type  double. 

Thus  an  explicit  type  cast 
(int)  is  required  to  assign  the 
value  to  the  int  variable  a. 
0.5  is  added  in  line  8  to  round 
to  the  nearest  integer  values. 


1  public  void  run(ImageProcessor  ip)  { 

2  int  w  =  ip  .getWidthO  ; 

3  int  h  =  ip . getHeight () ; 

4 

5  for  (int  v  =  0;  v  <  h;  v++)  { 

6  for  (int  u  =  0;  u  <  w;  u++)  { 

7  int  a  =  ip. get (u,  v) ; 

8  int  b  =  (int)  (a  *  1.5  +  0.5); 

9  if  (b  >  255) 

10  b  =  255;  //  clamp  to  the  maximum  value  (amax) 

11  ip . set (u,  v ,  b) ; 

12  } 

13  } 

14  } 


4.1.2  Limiting  Values  by  Clamping 

When  implementing  arithmetic  operations  on  pixel  values,  we  must 
keep  in  mind  that  the  calculated  results  must  not  exceed  the  admissi¬ 
ble  range  of  pixel  values  for  the  given  image  type  (e.g.,  [0,  255]  in  the 
case  of  8-bit  grayscale  images).  This  is  commonly  called  “clamping” 
and  can  be  expressed  in  the  form 


b 


min(max(/ (a),  amin),  amax)  =  < 


Umin 

for  /(a)  <  amin. 

n 

^max 

for  /  (a)  >  amax 

/(a) 

otherwise. 

(4.4) 


For  this  purpose,  line  10  of  Prog.  4.1  contains  the  statement 

if  (b  >  255)  b  =  255; 

which  limits  the  result  to  the  maximum  value  255.  Similarly,  one 
may  also  want  to  limit  the  results  to  the  minimum  value  (0)  to  avoid 
negative  pixel  values  (which  cannot  be  represented  by  this  type  of 
8-bit  image),  for  example,  by  the  statement 
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if  (b  <  0)  b  =  0; 


4.2  Point  Operations 
and  Histograms 


The  above  statement  is  not  needed  in  Prog.  4.1  because  the  interme¬ 
diate  results  can  never  be  negative  in  this  particular  operation. 


4.1.3  Inverting  Images 


Inverting  an  intensity  image  is  a  simple  point  operation  that  reverses 
the  ordering  of  pixel  values  (by  multiplying  by  —1)  and  adds  a  con¬ 
stant  value  to  map  the  result  to  the  admissible  range  again.  Thus 


for  a  pixel  value  a  =  I{u,  v )  in  the  range  [0,  a 
point  operation  is 


max 


the  corresponding 


/inv  (t)  ^  T  ^max  ^max  (^*5) 

The  inversion  of  an  8-bit  grayscale  image  with  amax  =  255  was  the 
task  of  our  first  plugin  example  in  Sec.  2.2.4  (Prog.  2.1).  Note  that 
in  this  case  no  clamping  is  required  at  all  because  the  function  al¬ 
ways  maps  to  the  original  range  of  values.  In  Image J,  this  oper¬ 
ation  is  performed  by  the  method  invert  ()  (for  objects  of  type 
ImageProcessor)  and  is  also  available  through  the  Edit  >  Invert  menu. 
Obviously,  inverting  an  image  mirrors  its  histogram,  as  shown  in  Fig. 
4.5(c). 


4.1.4  Threshold  Operation 

Thresholding  an  image  is  a  special  type  of  quantization  that  separates 
the  pixel  values  in  two  classes,  depending  upon  a  given  threshold 
value  q  that  is  usually  constant.  The  threshold  operation  maps  all 
pixels  to  one  of  two  fixed  intensity  values  a0  or  a1,  that  is, 


/threshold  (t) 


a0  for  a  <  q, 
a1  for  a  >  q, 


with  0  <  q  <  amax.  A  common  application  is  binarizing  an  intensity 
image  with  the  values  a0  =  0  and  a1  =  1. 

ImageJ  does  provide  a  special  image  type  (BinaryProcessor) 
for  binary  images,  but  these  are  actually  implemented  as  8-bit  in¬ 
tensity  images  (just  like  ordinary  intensity  images)  using  the  val¬ 
ues  0  and  255.  ImageJ  also  provides  the  ImageProcessor  method 
threshold (int  level),  with  level  =  q ,  to  perform  this  opera¬ 
tion,  which  can  also  be  invoked  through  the  Image  >  Adjust  >  Thresh¬ 
old  menu  (see  Fig.  4.1  for  an  example).  Thresholding  affects  the 
histogram  by  separating  the  distribution  into  two  entries  at  positions 
a0  and  a1,  as  illustrated  in  Fig.  4.2. 


4.2  Point  Operations  and  Histograms 

We  have  already  seen  that  the  effects  of  a  point  operation  on  the 
image’s  histogram  are  quite  easy  to  predict  in  some  cases.  For  ex¬ 
ample,  increasing  the  brightness  of  an  image  by  a  constant  value 
shifts  the  entire  histogram  to  the  right,  raising  the  contrast  widens 
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4  Point  Operations 


.  Threshold 


Fig.  4.1 

Threshold  operation:  orig¬ 
inal  image  (a)  and  corre¬ 
sponding  histogram  (c);  re¬ 
sult  after  thresholding  with 
ath  =  128,  a0  =  0,  a-^  =  255 
(b)  and  corresponding  his¬ 
togram  (d);  Image J’s  inter¬ 
active  Threshold  menu  (e). 


Fig.  4.2 

Effects  of  thresholding  upon 
the  histogram.  The  thresh¬ 
old  value  is  ath.  The  origi¬ 
nal  distribution  (a)  is  split 
and  merged  into  two  iso¬ 
lated  entries  at  a0  and  a1  in 
the  resulting  histogram  (b). 


it,  and  inverting  the  image  flips  the  histogram.  Although  this  ap¬ 
pears  rather  simple,  it  may  be  useful  to  look  a  bit  more  closely  at 
the  relationship  between  point  operations  and  the  resulting  changes 
in  the  histogram. 

As  the  illustration  in  Fig.  4.3  shows,  every  entry  (bar)  at  some 
position  i  in  the  histogram  maps  to  a  set  (of  size  h(z))  containing  all 
image  pixels  whose  values  are  exactly  i? 

If  a  particular  histogram  line  is  shifted  as  a  result  of  some  point  op¬ 
eration,  then  of  course  all  pixels  in  the  corresponding  set  are  equally 
modified  and  vice  versa.  So  what  happens  when  a  point  operation 
(e.g.,  reducing  image  contrast)  causes  two  previously  separated  his¬ 
togram  lines  to  fall  together  at  the  same  position  i  ?  The  answer  is 
that  the  corresponding  pixel  sets  are  merged  and  the  new  common 
histogram  entry  is  the  sum  of  the  two  (or  more)  contributing  entries 
(i.e.,  the  size  of  the  combined  set).  At  this  point,  the  elements  in 
the  merged  set  are  no  longer  distinguishable  (or  separable),  so  this 
operation  may  have  (perhaps  unintentionally)  caused  an  irreversible 
reduction  of  dynamic  range  and  thus  a  permanent  loss  of  information 
in  that  image. 


h(i) 

A 


h'W 

A 


CLr 


a 


(a) 


(b) 


2  Of  course  this  is  only  true  for  ordinary  histograms  with  an  entry  for 
every  single  intensity  value.  If  binning  is  used  (see  Sec.  3.4.1),  each 
histogram  entry  maps  to  pixels  within  a  certain  range  of  values. 
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h'(a2)  h(al)  +  h(a2) 


4.3  Automatic 
Contrast  Adjustment 


(a) 


h'W 


▲ 


Fig.  4.3 

Histogram  entries  represent 
sets  of  pixels  of  the  same 
value.  If  a  histogram  line 
is  moved  as  a  result  of  some 
point  operation,  then  all  pixels 
in  the  corresponding  set  are 
equally  modified  (a).  If,  due  to 
this  operation,  two  histogram 
lines  h(a1),  h(a2)  coincide  on 
the  same  index,  the  two  corre¬ 
sponding  pixel  sets  merge  and 
the  contained  pixels  become 
undiscernable  (b). 


(b) 


4.3  Automatic  Contrast  Adjustment 


Automatic  contrast  adjustment  (auto-contrast)  is  a  point  operation 
whose  task  is  to  modify  the  pixels  such  that  the  available  range  of 
values  is  fully  covered.  This  is  done  by  mapping  the  current  darkest 
and  brightest  pixels  to  the  minimum  and  maximum  intensity  values, 
respectively,  and  linearly  distributing  the  intermediate  values. 

Let  us  assume  that  alo  and  ahi  are  the  lowest  and  highest  pixel 
values  found  in  the  current  image,  whose  full  intensity  range  is 
[VinTmax]-  To  stretch  the  image  to  the  full  intensity  range  (see 
Fig.  4.4),  we  first  map  the  smallest  pixel  value  alo  to  zero,  subse¬ 
quently  increase  the  contrast  by  the  factor  (amax  —  amin)/(ahi  —  aio)? 
and  finally  shift  to  the  target  range  by  adding  amin.  The  mapping 
function  for  the  auto-contrast  operation  is  thus  defined  as 


r  (  \  ,  /  \  ^max  ^min  (  a  n\ 

Jacv^v  ^rniri  T  ^lo)  *  ?  \4:.7 ) 

^hi  CL\o 

provided  that  ahi  ^  alo;  that  is,  the  image  contains  at  least  two 
different  pixel  values.  For  an  8-bit  image  with  amin  =  0  and  amax  = 
255,  the  function  in  Eqn.  (4.7)  simplifies  to 

255 

«/ac(r)  (a  alo)  •  .  (4:*8) 

^hi  alo 


The  target  range  [ttmin,amax]  need  not  be  the  maximum  available 
range  of  values  but  can  be  any  interval  to  which  the  image  should 
be  mapped.  Of  course  the  method  can  also  be  used  to  reduce  the 
image  contrast  to  a  smaller  range.  Figure  4.5(b)  shows  the  effects 
of  an  auto-contrast  operation  on  the  corresponding  histogram,  where 
the  linear  stretching  of  the  intensity  range  results  in  regularly  spaced 
gaps  in  the  new  distribution. 
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4  Point  Operations 

Fig.  4.4 

Auto-contrast  operation 
according  to  Eqn.  (4.7). 

Original  pixel  values  a 
in  the  range  [alo,  ahi]  are 
mapped  linearly  to  the 
target  range  [aminJamax]. 


Clio  ®hi 


ar 


a 


max 


Fig.  4.5 

Effects  of  auto-contrast  and 
inversion  operations  on  the 
resulting  histograms.  Origi¬ 
nal  image  (a),  result  of  auto¬ 
contrast  operation  (b),  and 
inversion  (c).  The  histogram 
entries  are  shown  both  lin¬ 
early  (black  bars)  and  log¬ 
arithmically  (gray  bars). 


U 


x‘j:> 


(a) 


(b) 


(c) 


4.4  Modified  Auto-Contrast  Operation 


In  practice,  the  mapping  function  in  Eqn.  (4.7)  could  be  strongly 
influenced  by  only  a  few  extreme  (low  or  high)  pixel  values,  which 
may  not  be  representative  of  the  main  image  content.  This  can  be 
avoided  to  a  large  extent  by  “saturating”  a  fixed  percentage  (plo,  phi) 
of  pixels  at  the  upper  and  lower  ends  of  the  target  intensity  range. 
To  accomplish  this,  we  determine  two  limiting  values  a(Q,  a'hi  such 
that  a  predefined  quantile  glo  of  all  pixel  values  in  the  image  I  are 
smaller  than  a[Q  and  another  quantile  qhi  of  the  values  are  greater 
than  a'hi  (Fig.  4.6). 


Fig.  4.6 

Modified  auto-contrast  oper¬ 
ation  (Eqn.  (4.11)).  Prede¬ 
fined  quantiles  (glo,  q hi)  of 
image  pixels — shown  as  dark 
areas  at  the  left  and  right 
ends  of  the  histogram  h(i) — 
are  “saturated”  (i.e.,  mapped 
to  the  extreme  values  of  the 
target  range).  The  intermedi¬ 
ate  values  (a  ==  cq0,  .  .  .  ,  ^m) 
are  mapped  linearly  to  the 
interval  amin,  •  •  •  ,  a,max- 


®min 


a 


max 
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The  values  a(Q,  a'hi  depend  on  the  image  content  and  can  be  easily 
obtained  from  the  image’s  cumulative  histogram3  H: 


4.5  Histogram 
Equalization 


a[Q  =  min{  i 
a'hi  =  max{  i 


H(i)  >  M-N-plo}, 

H(i)  <  M-N-(l—phi)}, 


(4.9) 

(4.10) 


where  0  <  plo,phi  <  1,  P\0+Phi  <  1?  and  M-N  is  the  number  of  pixels 
in  the  image.  All  pixel  values  outside  (and  including)  a'lo  and  a'hi 
are  mapped  to  the  extreme  values  amin  and  amax,  respectively,  and 
intermediate  values  are  mapped  linearly  to  the  interval  [umin,amax  . 
Using  this  formulation,  the  mapping  to  minimum  and  maximum  in¬ 
tensities  does  not  depend  on  singular  extreme  pixels  only  but  can  be 
based  on  a  representative  set  of  pixels.  The  mapping  function  for  the 
modified  auto-contrast  operation  can  thus  be  defined  as 


/mac  («) 


a 


mm 


^  Umin  T  {Oj 


a 


lo) 


®max  amin 
ahi-alo 


V.  ^max 


for  a  <  aj0, 

for  a[Q  <a<a(li,  (4.11) 
for  a  >  a'hi. 


Usually  the  same  value  is  taken  for  both  upper  and  lower  quantiles 
(i.e.,  piQ  =  phi  =  p),  with  p  =  0.005, . . . ,  0.015  (0.5, . . . ,  1.5  %)  being 
common  values.  For  example,  the  auto-contrast  operation  in  Adobe 
Photoshop  saturates  0.5  %  (p  =  0.005)  of  all  pixels  at  both  ends  of  the 
intensity  range.  Auto-contrast  is  a  frequently  used  point  operation 
and  thus  available  in  practically  any  image-processing  software.  Im- 
ageJ  implements  the  modified  auto-contrast  operation  as  part  of  the 
Brightness/Contrast  and  Image  >  Adjust  menus  (Auto  button),  shown  in 
Fig.  4.7. 


Fig.  4.7 

Image J’s  Brightness/Contrast  tool 
(left)  and  Window/Level  tool 
(right)  can  be  invoked  through 
the  Image  >  Adjust  menu.  The 
Auto  button  displays  the  result 
of  a  modified  auto-contrast 
operation.  Apply  must  be  hit  to 
actually  modify  the  image. 


4.5  Histogram  Equalization 

A  frequent  task  is  to  adjust  two  different  images  in  such  a  way  that 
their  resulting  intensity  distributions  are  similar,  for  example,  to  use 


3 


See  Sec.  3.6. 
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4  Point  Operations 


Original 


modified 


Fig.  4.8 

Histogram  equalization. 
The  idea  is  to  find  and  apply 
a  point  operation  to  the  im¬ 
age  (with  original  histogram 
h)  such  that  the  histogram 
heq  of  the  modified  image 
approximates  a  uniform  dis¬ 
tribution  (top).  The  cumu¬ 
lative  target  histogram  Heq 
must  thus  be  approximately 
wedge-shaped  (bottom). 


them  in  a  print  publication  or  to  make  them  easier  to  compare.  The 
goal  of  histogram  equalization  is  to  find  and  apply  a  point  opera¬ 
tion  such  that  the  histogram  of  the  modified  image  approximates 
a  uniform  distribution  (see  Fig.  4.8).  Since  the  histogram  is  a  dis¬ 
crete  distribution  and  homogeneous  point  operations  can  only  shift 
and  merge  (but  never  split)  histogram  entries,  we  can  only  obtain 
an  approximate  solution  in  general.  In  particular,  there  is  no  way 
to  eliminate  or  decrease  individual  peaks  in  a  histogram,  and  a  truly 
uniform  distribution  is  thus  impossible  to  reach.  Based  on  point 
operations,  we  can  thus  modify  the  image  only  to  the  extent  that 
the  resulting  histogram  is  approximately  uniform.  The  question  is 
how  good  this  approximation  can  be  and  exactly  which  point  opera¬ 
tion  (which  clearly  depends  on  the  image  content)  we  must  apply  to 
achieve  this  goal. 

We  may  get  a  first  idea  by  observing  that  the  cumulative  his¬ 
togram  (Sec.  3.6)  of  a  uniformly  distributed  image  is  a  linear  ramp 
(wedge),  as  shown  in  Fig.  4.8.  So  we  can  reformulate  the  goal  as  find¬ 
ing  a  point  operation  that  shifts  the  histogram  lines  such  that  the 
resulting  cumulative  histogram  is  approximately  linear,  as  illustrated 
in  Fig.  4.9. 


Fig.  4.9 

Histogram  equalization  on 
the  cumulative  histogram. 
A  suitable  point  operation 
b  V-  /eq(a)  shifts  each  his¬ 
togram  line  from  its  origi¬ 
nal  position  a  to  6  (left  or 
right)  such  that  the  result¬ 
ing  cumulative  histogram 
Heq  is  approximately  linear. 


The  desired  point  operation  /eq()  is  simply  obtained  from  the 
cumulative  histogram  H  of  the  original  image  as4 


feq(a)  =  H(o) 


K  -  1 
M  ■  N 


(4.12) 


For  a  derivation,  see,  for  example,  [88,  p.  173]. 
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4.5  Histogram 
Equalization 


Fig.  4.10 

Linear  histogram  equalization 
example.  Original  image  /  (a) 
and  modified  image  I'  (b),  cor¬ 
responding  histograms  h,  W  (c, 
d),  and  cumulative  histograms 
H,  hfi  (e,  f).  The  resulting 
cumulative  histogram  H/  (f) 
approximates  a  uniformly  dis¬ 
tributed  image.  Notice  that 
new  peaks  are  created  in  the 
resulting  histogram  E  (d)  by 
merging  original  histogram 
cells,  particularly  in  the  lower 
and  upper  intensity  ranges. 


for  an  image  of  size  M  x  N  with  pixel  values  a  in  the  range  [0,  K—  1]. 
The  resulting  function  /eq(a)  in  Eqn.  (4.12)  is  monotonically  increas¬ 
ing,  because  H (a)  is  monotonic  and  K ,  M,  N  are  all  positive  con¬ 
stants.  In  the  (unusual)  case  where  an  image  is  already  uniformly  dis¬ 
tributed,  linear  histogram  equalization  should  not  modify  that  image 
any  further.  Also,  repeated  applications  of  linear  histogram  equaliza¬ 
tion  should  not  make  any  changes  to  the  image  after  the  first  time. 
Both  requirements  are  fulfilled  by  the  formulation  in  Eqn.  (4.12). 
Program  4.2  lists  the  Java  code  for  a  sample  implementation  of  lin¬ 
ear  histogram  equalization.  An  example  demonstrating  the  effects 
on  the  image  and  the  histograms  is  shown  in  Fig.  4.10. 

Notice  that  for  “inactive”  pixel  values  i  (i.e.,  pixel  values  that  do 
not  appear  in  the  image,  with  h(z)  =  0),  the  corresponding  entries 
in  the  cumulative  histogram  H(z)  are  either  zero  or  identical  to  the 
neighboring  entry  H(z  —  1).  Consequently  a  contiguous  range  of  zero 
values  in  the  histogram  h(z)  corresponds  to  a  constant  (i.e.,  fiat) 
range  in  the  cumulative  histogram  H(i),  and  the  function  /eq(a)  maps 
all  “inactive”  intensity  values  within  such  a  range  to  the  next  lower 
“active”  value.  This  effect  is  not  relevant,  however,  since  the  image 
contains  no  such  pixels  anyway.  Nevertheless,  a  linear  histogram 
equalization  may  (and  typically  will)  cause  histogram  lines  to  merge 
and  consequently  lead  to  a  loss  of  dynamic  range  (see  also  Sec.  4.2). 

This  or  a  similar  form  of  linear  histogram  equalization  is  imple¬ 
mented  in  almost  any  image-processing  software.  In  Image J  it  can 
be  invoked  interactively  through  the  Process  >  Enhance  Contrast  menu 
(option  Equalize).  To  avoid  extreme  contrast  effects,  the  histogram 


4  Point  Operations 


Prog.  4.2 

Linear  histogram  equaliza¬ 
tion  (ImageJ  plugin).  First 
the  histogram  of  the  im¬ 
age  ip  is  obtained  using  the 
standard  ImageJ  method 
ip .  getHistogramQ  in  line  7. 
In  line  9,  the  cumulative  his¬ 
togram  is  computed  “in  place” 
based  on  the  recursive  defi¬ 
nition  in  Eqn.  (3.6).  The  int 
division  in  line  16  implicitly 
performs  the  required  floor 
(L  J)  operation  by  truncation. 


1  public  void  run(ImageProcessor  ip)  { 

2  int  M  =  ip  .getWidthO  ; 

3  int  N  =  ip . getHeight () ; 

4  int  K  =  256;  //  number  of  intensity  values 

5 

6  //  compute  the  cumulative  histogram: 

7  int  []  H  =  ip .  get  Histogram  ()  ; 

8  for  (int  j  =  1;  j  <  H. length;  j++)  { 

9  H [j]  =  H [j  -  1]  +  H [j]  ; 

10  } 

11 

12  //  equalize  the  image: 

13  for  (int  v  =  0;  v  <  N;  v++)  { 

14  for  (int  u  =  0;  u  <  M;  u++)  { 

15  int  a  =  ip. get (u,  v) ; 

16  int  b  =  H  [a]  *  (K  -  1)  /  (M  *  N) ;  // s.  Equation  (4.12) 

17  ip .  set  (u,  v ,  b)  ; 

18  } 

19  } 

20  } 


equalization  in  ImageJ  by  default5  cumulates  the  square  root  of  the 
histogram  entries  using  a  modified  cumulative  histogram  of  the  form 


H«  =  Ev/h(i)-  (4-13) 

3=0 


4.6  Histogram  Specification 

Although  widely  implemented,  the  goal  of  linear  histogram  equalization 
a  uniform  distribution  of  intensity  values  (as  described  in  the  previous 
section) — appears  rather  ad  hoc,  since  good  images  virtually  never 
show  such  a  distribution.  In  most  real  images,  the  distribution  of 
the  pixel  values  is  not  even  remotely  uniform  but  is  usually  more 
similar,  if  at  all,  to  perhaps  a  Gaussian  distribution.  The  images 
produced  by  linear  equalization  thus  usually  appear  quite  unnatural, 
which  renders  the  technique  practically  useless. 

Histogram  specification  is  a  more  general  technique  that  modifies 
the  image  to  match  an  arbitrary  intensity  distribution,  including  the 
histogram  of  a  given  image.  This  is  particularly  useful,  for  exam¬ 
ple,  for  adjusting  a  set  of  images  taken  by  different  cameras  or  under 
varying  exposure  or  lighting  conditions  to  give  a  similar  impression  in 
print  production  or  when  displayed.  Similar  to  histogram  equaliza¬ 
tion,  this  process  relies  on  the  alignment  of  the  cumulative  histograms 
by  applying  a  homogeneous  point  operation.  To  be  independent  of 
the  image  size  (i.e.,  the  number  of  pixels),  we  first  define  normalized 
distributions,  which  we  use  in  place  of  the  original  histograms. 

5  The  “classic”  linear  approach  (see  Eqn.  (3.5))  is  used  when  simultane¬ 
ously  keeping  the  Alt  key  pressed. 
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4.6.1  Frequencies  and  Probabilities 

The  value  in  each  histogram  cell  describes  the  observed  frequency 
of  the  corresponding  intensity  value,  he.,  the  histogram  is  a  discrete 
frequency  distribution.  For  a  given  image  I  of  size  M  x  TV,  the  sum 
of  all  histogram  entries  h(z)  equals  the  number  of  image  pixels, 


4.6  Histogram 
Specification 


Eh(  i)  =  M  ■  N  .  (4.14) 


The  associated  normalized  histogram, 


hO 

M-N' 


for  0  <  i  <  K, 


(4.15) 


is  usually  interpreted  as  the  probability  distribution  or  probability  den¬ 
sity  function  (pdf)  of  a  random  process,  where  p(i)  is  the  probability 
for  the  occurrence  of  the  pixel  value  i.  The  cumulative  probability 
of  i  being  any  possible  value  is  1,  and  the  distribution  p  must  thus 
satisfy 

K- 1 

E  p(*)  =  1  •  (4-16) 

2  =  0 

The  statistical  counterpart  to  the  cumulative  histogram  H  (Eqn. 
(3.5))  is  the  discrete  distribution  function  P()  (also  called  the  cu¬ 
mulative  distribution  function  or  cdf), 


H(i) 

H(K-l) 


H(Q 

M-N 


y  JEL  =  y  p(j) 

^  M-N 

j= o  j= o 


(4.17) 


for  i  =  0, . . . ,  K  —  1 .  The  computation  of  the  cdf  from  a  given  his¬ 
togram  h  is  outlined  in  Alg.  4.1.  The  resulting  function  P(i)  is  (as  the 
cumulative  histogram)  monotonically  increasing  and,  in  particular, 


K  —  l 

P(0)  =  p(0)  and  P(K  —  1)  =  p(i)  =  1  .  (4.18) 

2  =  0 

This  statistical  formulation  implicitly  treats  the  generation  of 
images  as  a  random  process  whose  exact  properties  are  mostly  un¬ 
known.6  However,  the  process  is  usually  assumed  to  be  homogeneous 
(independent  of  the  image  position);  that  is,  each  pixel  value  is  the 
result  of  a  “random  experiment”  on  a  single  random  variable  i.  The 
observed  frequency  distribution  given  by  the  histogram  h(z)  serves  as 
a  (coarse)  estimate  of  the  probability  distribution  p(z)  of  this  random 
variable. 

4.6.2  Principle  of  Histogram  Specification 

The  goal  of  histogram  specification  is  to  modify  a  given  image  I A  by 
some  point  operation  such  that  its  distribution  function  PA  matches 

6  Statistical  modeling  of  the  image  generation  process  has  a  long  tradition 
(see,  e.g.,  [128,  Ch.  2]). 
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4  Point  Operations 


Alg.  4.1 

Calculation  of  the  cumula¬ 
tive  distribution  function  (cdf) 
P(z)  from  a  given  histogram 
h  of  length  K .  See  Prog.  4.3 
(p.  73)  for  the  correspond¬ 
ing  Java  implementation. 


Fig.  4.11 

Principle  of  histogram  specifi¬ 
cation.  Given  is  the  reference 
distribution  (left)  and  the 
distribution  function  for  the 
original  image  P^  (right).  The 
result  is  the  mapping  function 
fhs  •  a  a'  for  a  point  op¬ 
eration,  which  replaces  each 
pixel  a  in  the  original  image 
Ia  by  a  modified  value  a' .  The 
process  has  two  main  steps: 
(A)  For  each  pixel  value  a,  de¬ 
termine  b  —  PA  (a)  from  the 
right  distribution  function. 

(b)  a'  is  then  found  by  in¬ 
verting  the  left  distribution 
function  as  a'  =  Pi^1(6). 
In  summary,  the  result  is 

fhs  (a)  =  a'  = 


1:  Cdf(h) 

Returns  the  cumulative  distribution  function  P(i)  £  [0,  1]  for  a 
given  histogram  h(i),  with  i  =  0, . . . ,  K—  1. 

2:  Let  K  <—  Size(h) 

3:  Let  n  <—  ED  K*) 

4:  Create  map  P:  [0,  K—  1]  R 

5:  Let  c  f —  0 

6:  for  i  4 —  0, . . . ,  K  —  1  do 

7:  c  <—  c  +  h(i)  t>  cumulate  histogram  values 

8:  P(i)  <—  c/n 

9:  return  P. 


Pr(0  Reference 


PaW  Original 


A  4 


a  reference  distribution  PR  as  closely  as  possible.  We  thus  look  for  a 
mapping  function 

a’  =  fhs(a)  (4.19) 

to  convert  the  original  image  IA  by  a  point  operation  to  a  new  image 
IA,  with  pixel  values  a',  such  that  its  distribution  function  P'A  matches 
P^,  that  is, 

PA{i)  ~  P R(i)  ,  for  0  <  i  <  K .  (4.20) 

As  illustrated  in  Fig.  4.11,  the  desired  mapping  /hs  is  found  by  com¬ 
bining  the  two  distribution  functions  Pr  and  P A  (see  [88,  p.  180]  for 
details).  For  a  given  pixel  value  a  in  the  original  image,  we  obtain 
the  new  pixel  value  a'  as 

a' =  P^1  (PA(a))  =  Pr1  (b)  (4.21) 

and  thus  the  mapping  /hs  (Eqn.  (4.19))  is  defined  as 

/hs(«)  =  PfldPRa))  ,  for  0  <  a  <  K.  (4.22) 

This  of  course  assumes  that  P^(i)  is  invertible,  that  is,  that  the  func¬ 
tion  P ^(b)  exists  for  b  £  [0, 1]. 

4.6.3  Adjusting  to  a  Piecewise  Linear  Distribution 

If  the  reference  distribution  P^  is  given  as  a  continuous,  invertible 
function,  then  the  mapping  function  /hs  can  be  obtained  from  Eqn. 
(4.22)  without  any  difficulty.  In  practice,  it  is  convenient  to  specify 
the  (synthetic)  reference  distribution  as  a  piecewise  linear  function 
P l(0;  is,  as  a  sequence  of  iV  +  1  coordinate  pairs 
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P zP) 


A 


4.6  Histogram 
Specification 


Fig.  4.12 

Piecewise  linear  reference 
distribution.  The  func¬ 
tion  P L(i)  is  specified  by 
N  =  5  control  points  (0,  P0), 
Oi,  Pi)  ,  •  •  •  ,  (a4,  P4),  with 
CLk  <C  CLfc_ j_  and  Pk  <  Pfe_|_i . 
The  final  point  Pe  is  fixed  at 
(if-i.i). 


(Oo  5  P0)  ’  (Ti  ’  Pf  )?•••?  K  ">  Pfc )?•••?  (^AT  7  P/v ) ) 


each  consisting  of  an  intensity  value  ak  and  the  corresponding  cumu¬ 
lative  probability  Pk.  We  assert  that  0  <  ak  <  K ,  ak  <  afc+1,  and 
0  <  p  1.  Also,  the  two  endpoints  (uo,Fq)  and  (u^y,-P^y)  are  hxed 
at 

(0,  P0)  and  (K- 1,1), 

respectively.  To  be  invertible,  the  function  must  also  be  strictly  mo- 
notomc,  that  is,  -Pk  Pk+ 1  for  0  <  k  <  N.  Figure  4.12  shows  an 
example  of  such  a  function,  which  is  specified  by  N  =  5  variable 
points  (P0,...,P4)  and  a  fixed  end  point  P5  and  thus  consists  of 
N  =  5  linear  segments.  The  reference  distribution  can  of  course 
be  specified  at  an  arbitrary  accuracy  by  inserting  additional  control 
points. 

The  intermediate  values  of  P L(i)  are  obtained  by  linear  interpo¬ 
lation  between  the  control  points  as 


Pm  +  (i-Om)  -  for  0  <  i  <  K  —  l, 

1  for  i  =  K  —  l. 


(4.23) 


where  m  =  ma x{j  E  [0,  AT  —  1]  |  clj  <  z}  is  the  index  of  the  line 
segment  (am,Pm)  — >  (am+i,  Pm+i),  which  overlaps  the  position  i. 
For  instance,  in  the  example  in  Fig.  4.12,  the  point  a  lies  within  the 
segment  that  starts  at  point  (a2,P 2);  i.e. ,  m  =  2. 

For  the  histogram  specification  according  to  Eqn.  (4.22),  we  also 
need  the  inverse  distribution  function  P^1^)  for  b  E  [0, 1].  As  we  see 
from  the  example  in  Fig.  4.12,  the  function  P L(i)  is  in  general  not 
invertible  for  values  b  <  PL(0).  We  can  fix  this  problem  by  mapping 
all  values  b  <  PL(0)  to  zero  and  thus  obtain  a  “semi-inverse”  of  the 
reference  distribution  in  Eqn.  (4.23)  as 


'0 

V  -^n) 

l  K  —  l 


(Pn+l-Pn) 


for  0  <  b  <  PL(0), 
for  PL(0)  <  b  <  1, 
for  b  >  1. 


(4.24) 


Here  n  =  ma x{j  E  {0, . . .  N  —  1}  |  P3  <  6}  is  the  index  of  the  line 
segment  (an,  Pn)  -A  (an+1,  Pn+1),  which  overlaps  the  argument  value 
b.  The  required  mapping  function  /hs  for  adapting  a  given  image  with 
intensity  distribution  PA  is  finally  specified,  analogous  to  Eqn.  (4.22), 


as 
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4  Point  Operations 

Alg.  4.2 

Histogram  specification  using 
a  piecewise  linear  reference 
distribution.  Given  is  the  his¬ 
togram  h  of  the  original  image 
and  a  piecewise  linear  reference 
distribution  function,  speci¬ 
fied  as  a  sequence  of  N  control 
points  L.  The  discrete  map¬ 
ping  /hs  for  the  corresponding 
point  operation  is  returned. 
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1: 

MatchPiecewiseLinearHistogram(h,  L) 

Input:  h,  histogram  of  the  original  image  /;  L,  reference  distri¬ 
bution  function,  given  as  a  sequence  of  N  +  1  control  points  L  = 

[(a0;  -Po),(al!-Pl)>--->  (aiV>  -P/v)]>  with  0 

<  ak  <  K,  0  <  Pk  <  1, 

and  Pk  <  Pk+ 1-  Returns  a  discrete  mapping  /hs(a)  to  be  applied 

to  the  original  image  I . 

2 

N  <—  Size(L)  +  1 

3 

Let  K  <—  Size(h) 

4 

Let  P  <—  CDF(h) 

>  cdf  for  h  (see  Alg.  4.1) 

5 

Create  map  fhs :  [0,  K  —  1]  gM 

>  function  fhs 

6 

for  a  0, . . . ,  K  —  1  do 

7 

b<r-  P(a) 

8 

if  (b  <  P0)  then 

9 

a'  0 

10 

else  if  ( b  >  1)  then 

11 

a  4r-  K- 1 

12 

else 

13 

n  N  —  1 

14 

while  (n  >  0)  A  (Pn  >  b)  do 

>  find  line  segment  in  L 

15 

n  n  —  1 

16 

/  |  (i  tj  \  {p"n+ 1  ^n) 

a  an  +  (o  Pn)‘  (  . 

V-*  ro+l  n/ 

>  see  Eqn.  4.24 

17:  /hs[°]  a 

18 

return  fhs. 

fhs(a)  =  >  for  0  <  a  <  K.  (4.25) 

The  whole  process  of  computing  the  pixel  mapping  function  for  a 
given  image  (histogram)  and  a  piecewise  linear  target  distribution  is 
summarized  in  Alg.  4.2.  A  real  example  is  shown  in  Fig.  4.14  (Sec. 
4.6.5). 

4.6.4  Adjusting  to  a  Given  Histogram  (Histogram 
Matching) 

If  we  want  to  adjust  one  image  to  the  histogram  of  another  image, 
the  reference  distribution  function  P R(i)  is  not  continuous  and  thus, 
in  general,  cannot  be  inverted  (as  required  by  Eqn.  (4.22)).  For  ex¬ 
ample,  if  the  reference  distribution  contains  zero  entries  (i.e.,  pixel 
values  k  with  probability  p (k)  =  0),  the  corresponding  cumulative 
distribution  function  P  (just  like  the  cumulative  histogram)  has  in¬ 
tervals  of  constant  value  on  which  no  inverse  function  value  can  be 
determined. 

In  the  following,  we  describe  a  simple  method  for  histogram 
matching  that  works  with  discrete  reference  distributions.  The  prin¬ 
cipal  idea  is  graphically  illustrated  in  Fig.  4.13.  The  mapping  func¬ 
tion  fhs  is  not  obtained  by  inverting  but  by  “filling  in”  the  reference 
distribution  function  Pfl(i)-  For  each  possible  pixel  value  a,  starting 
with  a  =  0,  the  corresponding  probability  p A(a)  is  stacked  layer  by 
layer  “under”  the  reference  distribution  Pr-  The  thickness  of  each 
horizontal  bar  for  a  equals  the  corresponding  probability  p A(a).  The 
bar  for  a  particular  intensity  value  a  with  thickness  p A(a)  runs  from 


P R(i)  Reference  pA  (i)  Original 


right  to  left,  down  to  position  a',  where  it  hits  the  reference  distribu¬ 
tion  Pr.  This  position  a'  corresponds  to  the  new  pixel  value  to  which 
a  should  be  mapped. 

Since  the  sum  of  all  probabilities  and  the  maximum  of  the 
distribution  function  are  both  1  (i.e.,  JT  p^z)  =  max?:  Pfl(i)  = 
1),  all  horizontal  bars  will  exactly  fit  underneath  the  function  Pr- 
One  may  also  notice  in  Fig.  4.13  that  the  distribution  value  resulting 
at  a'  is  identical  to  the  cumulated  probability  PA(a).  Given  some 
intensity  value  a,  it  is  therefore  sufficient  to  find  the  minimum  value 
a7,  where  the  reference  distribution  P^(a/)  is  greater  than  or  equal  to 
the  cumulative  probability  P a  («),  that  is, 

/hs(a)  =  min{  j  |  (0  <  j  <  K)  A  (PA(a)  <  P R(j))  }  .  (4.26) 

This  results  in  a  very  simple  method,  which  is  summarized  in 
Alg.  4.3.  The  corresponding  Java  implementation  in  Prog.  4.3,  con¬ 
sists  of  the  method  matchHistograms  () ,  which  accepts  the  original 
histogram  (hA)  and  the  reference  histogram  (hR)  and  returns  the 
resulting  mapping  function  (fhs)  specifying  the  required  point  oper¬ 
ation. 

Due  to  the  use  of  normalized  distribution  functions,  the  size  of 
the  associated  images  is  not  relevant.  The  following  code  fragment 
demonstrates  the  use  of  the  matchHistograms  ()  method  from  Prog. 
4.3  in  Image  J: 

ImageProcessor  ipA  =  .  .  .  //target  image  I A  (to  be  modified) 

ImageProcessor  ipR  =  .  .  .  //  reference  image  Ir 

int  []  hA  =  ipA .  getHistogram  () ;  //  get  histogram  for  IA 
int  []  hR  =  ipR. getHistogram () ;  //  get  histogram  for  I R 

int  []  fhs  =  matchHistograms  (hA,  hR) ;  //  mapping  function  /hs (a) 

ipA .  applyTable  (fhs) ;  //  modify  the  target  image  I A 

The  original  image  ipA  is  modified  in  the  last  line  by  applying  the 
mapping  function  /hs  (fhs)  with  the  method  applyTable  ()  (see  also 
p.  83). 

4.6.5  Examples 

Adjusting  to  a  piecewise  linear  reference  distribution 

The  first  example  in  Fig.  4.14  shows  the  results  of  histogram  spec¬ 
ification  for  a  continuous,  piecewise  linear  reference  distribution,  as 


4.6  Histogram 
Specification 

Fig.  4.13 

Discrete  histogram  specifica¬ 
tion.  The  reference  distribu¬ 
tion  PR  (left)  is  “filled”  layer 
by  layer  from  bottom  to  top 
and  from  right  to  left.  For  ev¬ 
ery  possible  intensity  value  a 
(starting  from  a  =  0),  the  as¬ 
sociated  probability  pA(a)  is 
added  as  a  horizontal  bar  to  a 
stack  accumulated  ‘under”  the 
reference  distribution  PR.  The 
bar  with  thickness  pA(a)  is 
drawn  from  right  to  left  down 
to  the  position  a' ,  where  the 
reference  distribution  pr  is 
reached.  The  function  /hs() 
must  map  a  to  a  . 
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4  Point  Operations 

Alg.  4.3 

Histogram  matching. 
Given  are  two  histograms:  the 
histogram  hA  of  the  target 
image  I A  and  a  reference  his¬ 
togram  hR ,  both  of  size  K. 
The  result  is  a  discrete  map¬ 
ping  function  /hs()  that,  when 
applied  to  the  target  image, 
produces  a  new  image  with  a 
distribution  function  similar 
to  the  reference  histogram. 
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1: 


2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 


MatchHistograms(hA,  h^) 

Input:  hA,  histogram  of  the  target  image  IA\  h^,  reference  his¬ 
togram  (the  same  size  as  hA).  Returns  a  discrete  mapping  /hs(a) 
to  be  applied  to  the  target  image  IA. 

K  Size(hA) 

PA  <—  CDF(hA)  >  c.d.f.  for  hA  (Alg.  4.1) 

<—  CDF(hi?)  >  c.d.f.  for  (Alg.  4.1) 

Create  map  fhs :  [0,  K  —  l]  ha  R  t>  pixel  mapping  function  fhs 


for  a  <—  0, . . . ,  K  —  1  do 
j  <—  K- 1 

repeat 

fhs  [ a \  j 

j  i —  j  —  1 

while  (j  >  0)  A  (P^ (a)  <  P R(j)) 


return  /hs. 


described  in  Sec.  4.6.3.  Analogous  to  Fig.  4.12,  the  actual  distribution 
function  (Fig.  4.14(f))  is  specified  as  a  polygonal  line  consisting 
of  five  control  points  (ak,qk)  with  coordinates 

k=  012345 

ak  =  0  28  75  150  210  255  • 

qk  =  0.002  0.050  0.250  0.750  0.950  1.000 

The  resulting  reference  histogram  (Fig.  4.14(c))  is  a  step  function 
with  ranges  of  constant  values  corresponding  to  the  linear  segments 
of  the  probability  density  function.  As  expected,  the  cumulative 
probability  function  for  the  modified  image  (Fig.  4.14(h))  is  quite 
close  to  the  reference  function  in  Fig.  4.14(f),  while  the  resulting 
histogram  (Fig.  4.14(e))  shows  little  similarity  with  the  reference  his¬ 
togram  (Fig.  4.14(c)).  However,  as  discussed  earlier,  this  is  all  we 
can  expect  from  a  homogeneous  point  operation. 

Adjusting  to  an  arbitrary  reference  histogram 

The  example  in  Fig.  4.15  demonstrates  this  technique  using  synthetic 
reference  histograms  whose  shape  is  approximately  Gaussian.  In  this 
case,  the  reference  distribution  is  not  given  as  a  continuous  func¬ 
tion  but  specified  by  a  discrete  histogram.  We  thus  use  the  method 
described  in  Sec.  4.6.4  to  compute  the  required  mapping  functions. 

The  target  image  used  here  was  chosen  intentionally  for  its  poor 
quality,  manifested  by  an  extremely  unbalanced  histogram.  The  his¬ 
tograms  of  the  modified  images  thus  naturally  show  little  resemblance 
to  a  Gaussian.  However,  the  resulting  cumulative  histograms  match 
nicely  with  the  integral  of  the  corresponding  Gaussians,  apart  from 
the  unavoidable  irregularity  at  the  center  caused  by  the  dominant 
peak  in  the  original  histogram. 

Adjusting  to  another  image 

The  third  example  in  Fig.  4.16  demonstrates  the  adjustment  of  two 
images  by  matching  their  intensity  histograms.  One  of  the  images 
is  selected  as  the  reference  image  Ir  (Fig.  4.16(b))  and  supplies  the 


1 

int  []  matchHistograms  (int[] 

hA ,  int  []  hR)  { 

2 

//  hA  . . .  histogram  hA  of  the  target  image  I A  (to  be  modified) 

3 

//  hR  . . .  reference  histogram 

4 

c; 

//  returns  the  mapping  /hs()  to  be  applied  to  image  IA 

o 

6 

int  K  =  hA . length ; 

7 

double  []  PA  =  Cdf  (hA) ; 

//  get  CDF  of  histogram  hA 

8 

double  []  PR  =  Cdf  (hR) ; 

//  get  CDF  of  histogram 

9 

10 

int  []  fhs  =  new  int  [K]  ; 

//mapping  /hs() 

11 

//  compute  mapping  function  /hs() 

■ 

12 

for  (int  a  =  0;  a  <  K;  a++) 

{ 

13 

int  j  =  K  -  1 ; 

14 

do  { 

15 

fhs  [a]  =  j  ; 

16 

j— ; 

17 

}  while  (j  >=  0  &&  PA [a] 

<=  PR[j] ) ; 

18 

} 

19 

return  fhs ; 

20 

} 

4.6  Histogram 
Specification 

Prog.  4.3 

Histogram  matching  (Java 
implementation  of  Alg.  4.3). 
The  method  matchHistograms  () 
computes  the  mapping  func¬ 
tion  fhs  from  the  target  his¬ 
togram  hA  and  the  reference 
histogram  hR  (see  Eqn.  (4.26)). 
The  method  Cdf  ()  computes 
the  cumulative  distribution 
function  (cdf)  for  a  given  his¬ 
togram  (Eqn.  (4.17)). 


22 

doublet]  Cdf  (int[]  h)  { 

23 

//  returns  the  cumul.  distribution  function  for  histogram  h 

24 

int  K  =  h . length ; 

25 

26 

int  n  =  0; 

//  sum  all  histogram  values 

27 

for  (int  i  =  0;  i  <  K;  i++) 

{ 

28 

n  +=  h  [i] ; 

29 

} 

30 

31 

double  []  P  =  new  double  [K]  ; 

//  create  CDF  table  P 

32 

int  c  =  h[0]  ; 

//  cumulate  histogram  values 

33 

P [0]  =  (double)  c  /  n; 

34 

for  (int  i  =  1;  i  <  K;  i++) 

{ 

35 

c  +=  h  [i] ; 

36 

P[i]  =  (double)  c  /  n; 

37 

} 

38 

return  P; 

39 

} 

reference  histogram  hw  (Fig.  4.16(e)).  The  second  (target)  image 
Ia  (Fig.  4.16(a))  is  modified  such  that  the  resulting  cumulative  his¬ 
togram  matches  the  cumulative  histogram  of  the  reference  image  IR. 
It  can  be  expected  that  the  final  image  I  A'  (Fig.  4.16(c))  and  the 
reference  image  give  a  similar  visual  impression  with  regard  to  tonal 
range  and  distribution  (assuming  that  both  images  show  similar  con¬ 
tent). 

Of  course  this  method  may  be  used  to  adjust  multiple  images 
to  the  same  reference  image  (e.g.,  to  prepare  a  series  of  similar  pho¬ 
tographs  for  a  print  project).  For  this  purpose,  one  could  either  select 
a  single  representative  image  as  a  common  reference  or,  alternatively, 
compute  an  “average”  reference  histogram  from  a  set  of  typical  im¬ 
ages  (see  also  Exercise  4.7). 
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Original  image 


Modified  image 


Fig.  4.14 

Histogram  specification  with 
a  piecewise  linear  reference 
distribution.  The  target  image 
IA  (a),  its  histogram  (d),  and 
distribution  function  PA  (g); 
the  reference  histogram  hR  (c) 
and  the  corresponding  distri¬ 
bution  PR  (f);  the  modified 
image  I A/  (b),  its  histogram 
h Af  (e),  and  the  resulting  dis¬ 
tribution  P Ar  (h).  Associ¬ 
ated  mapping  function  fhs  (j). 
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(f)  Pfi 


(g)  Pa  (h)  P'A 

fhs G) 


4.7  Gamma  Correction 

We  have  been  using  the  terms  “intensity”  and  “brightness”  many 
times  without  really  bothering  with  how  the  numeric  pixel  values  in 
our  images  relate  to  these  physical  concepts,  if  at  all.  A  pixel  value 
may  represent  the  amount  of  light  falling  onto  a  sensor  element  in  a 
camera,  the  photographic  density  of  him,  the  amount  of  light  to  be 
emitted  by  a  monitor,  the  number  of  toner  particles  to  be  deposited 
by  a  printer,  or  any  other  relevant  physical  magnitude.  In  practice, 
the  relationship  between  a  pixel  value  and  the  corresponding  physical 
quantity  is  usually  complex  and  almost  always  nonlinear.  In  many 
imaging  applications,  it  is  important  to  know  this  relationship,  at 
least  approximately,  to  achieve  consistent  and  reproducible  results. 

When  applied  to  digital  intensity  images,  the  ideal  is  to  have  some 
kind  of  “calibrated  intensity  space”  that  optimally  matches  the  hu¬ 
man  perception  of  intensity  and  requires  a  minimum  number  of  bits 
to  represent  the  required  intensity  range.  Gamma  correction  denotes 
a  simple  point  operation  to  compensate  for  the  transfer  character¬ 
istics  of  different  input  and  output  devices  and  to  map  them  to  a 
unified  intensity  space. 


4.7  Gamma  Correction 

Fig.  4.15 

Histogram  matching:  adjust¬ 
ing  to  a  synthetic  histogram. 
Original  image  IA  (a),  corre¬ 
sponding  histogram  (f),  and 
cumulative  histogram  (i). 
Gaussian-shaped  reference 
histograms  with  center  /r  =  128 
and  er  =  50  (d)  and  a  =  100 
(e),  respectively.  Resulting 
images  after  histogram  match¬ 
ing,  IG 50  (b)  and  IG100  (c) 
with  the  corresponding  his¬ 
tograms  (g,  h)  and  cumulative 
histograms  (j,k).  Associated 
mapping  function  /hs  (1). 


4.7.1  Why  Gamma? 

The  term  “gamma”  originates  from  analog  photography,  where  the 
relationship  between  the  light  energy  and  the  resulting  film  density 
is  approximately  logarithmic.  The  “exposure  function”  (Fig.  4.17), 
specifying  the  relationship  between  the  logarithmic  light  intensity 
and  the  resulting  him  density,  is  therefore  approximately  linear  over 
a  wide  range  of  light  intensities.  The  slope  of  this  function  within 
this  linear  range  is  traditionally  referred  to  as  the  “gamma”  of  the 
photographic  material.  The  same  term  was  adopted  later  in  televi- 
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Reference  image 


Original  image 


Modified  image 


Fig.  4.16 

Histogram  matching:  adjust¬ 
ing  to  a  reference  image.  The 
target  image  I A  (a)  is  modified 
by  matching  its  histogram  to 
the  reference  image  IR  (b), 
resulting  in  the  new  image 
IAf  (c).  The  corresponding 
histograms  h^,  hR,  hA/  (d— f) 
and  cumulative  histograms 
H  A ,  H  Ri  P A'  (g— i)  are  shown. 
Notice  the  good  agreement 
between  the  cumulative  his¬ 
tograms  of  the  reference  and 
adjusted  images  (h,i).  Associ¬ 
ated  mapping  function  fhs  (j). 


sion  broadcasting  to  describe  the  nonlinearities  of  the  cathode  ray 
tubes  used  in  TV  receivers,  that  is,  to  model  the  relationship  be¬ 
tween  the  amplitude  (voltage)  of  the  video  signal  and  the  emitted 
light  intensity.  To  compensate  for  the  nonlinearities  of  the  receivers, 
a  “gamma  correction”  was  (and  is)  applied  to  the  TV  signal  once 
before  broadcasting  in  order  to  avoid  the  need  for  costly  correction 
measures  on  the  receiver  side. 


Fig.  4.17 

Exposure  function  of  photo¬ 
graphic  film.  With  respect 
to  the  logarithmic  light  in¬ 
tensity  B ,  the  resulting  film 
density  D  is  approximately 
linear  over  a  wide  intensity 
range.  The  slope  (AD  /  AB)  of 
this  linear  section  of  the  func¬ 
tion  specifies  the  “gamma”  (7) 
value  for  a  particular  type 
of  photographic  material. 


D 
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4.7.2  Mathematical  Definition 


4.7  Gamma  Correction 


Gamma  correction  is  based  on  the  exponential  function 

/»  =  (4.27) 

where  the  parameter  7  E  R  is  called  the  gamma  value.  If  a  is  con¬ 
strained  to  the  interval  [0, 1],  then — independent  of  7 — the  value  of 
/7(a)  also  stays  within  [0, 1],  and  the  function  always  runs  through 
the  points  (0,0)  and  (1, 1).  In  particular,  /7(a)  is  the  identity  func¬ 
tion  for  7  =  1,  as  shown  in  Fig.  4.18.  The  function  runs  above  the  di¬ 
agonal  for  gamma  values  7  <  1,  and  below  it  for  7  >  1.  Controlled  by 
a  single  continuous  parameter  (7),  the  power  function  can  thus  “im¬ 
itate”  both  logarithmic  and  exponential  types  of  functions.  Within 
the  interval  [0, 1],  the  function  is  continuous  and  strictly  monotonic, 
and  also  very  simple  to  invert  as 

a  =  /7(b)  =  61/7,  (4.28) 

since  W7  =  (a7)1  1  =  a1  =  a.  The  inverse  of  the  exponential 
function  /^1(6)  is  thus  again  an  exponential  function, 

/7-1(6)  =  /7(6)  =  /1/7(6), 

with  the  parameter  7  =  1/7. 


Fig.  4.18 

Gamma  correction  function 
/7(a)  =  a 7  for  a  E  [0,  1]  and 
different  gamma  values. 


b  =  a7 
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20 
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(4.29) 


4.7.3  Real  Gamma  Values 

The  actual  gamma  values  of  individual  devices  are  usually  specified 
by  the  manufacturers  based  on  real  measurements.  For  example, 
common  gamma  values  for  CRT  monitors  are  in  the  range  1.8  to  2.8, 
with  2.4  as  a  typical  value.  Most  LCD  monitors  are  internally  ad¬ 
justed  to  similar  values.  Digital  video  and  still  cameras  also  emulate 
the  transfer  characteristics  of  analog  film  and  photographic  cameras 
by  making  internal  corrections  to  give  the  resulting  images  an  accus¬ 
tomed  “look”. 
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Fig.  4.19 

Principle  of  gamma  correction. 

To  compensate  the  output 
signal  S  produced  by  a  camera 
with  nominal  gamma  value  7C, 
a  gamma  correction  is  applied 
with  7C  =  1/7C.  The  corrected 
signal  S'  is  proportional  to 
the  received  light  intensity  L. 
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In  TV  receivers,  gamma  values  are  standardized  with  2.2  for  ana¬ 
log  NTSC  and  2.8  for  the  PAL  system  (these  values  are  theoretical; 
results  of  actual  measurements  are  around  2.35).  A  gamma  value  of 
1/2.2  ss  0.45  is  the  norm  for  cameras  in  NTSC  as  well  as  the  EBU7 
standards.  The  current  international  standard  ITU-R  BT.7098  calls 
for  uniform  gamma  values  of  2.5  in  receivers  and  1/1.956  ~  0.51 
for  cameras  [76, 122].  The  ITU  709  standard  is  based  on  a  slightly 
modified  version  of  the  gamma  correction  (see  Sec.  4.7.6). 

Computers  usually  allow  adjustment  of  the  gamma  value  applied 
to  the  video  output  signals  to  adapt  to  a  wide  range  of  different 
monitors.  Note,  however,  that  the  power  function  /7()  is  only  a 
coarse  approximation  to  the  actual  transfer  characteristics  of  any 
device,  which  may  also  not  be  the  same  for  different  color  channels. 
Thus  significant  deviations  may  occur  in  practice,  despite  the  careful 
choice  of  gamma  settings.  Critical  applications,  such  as  prepress  or 
high-end  photography,  usually  require  additional  calibration  efforts 
based  on  exactly  measured  device  profiles  (see  Sec.  14.7.4). 

4.7.4  Applications  of  Gamma  Correction 

Let  us  first  look  at  the  simple  example  illustrated  in  Fig.  4.19.  As¬ 
sume  that  we  use  a  digital  camera  with  a  nominal  gamma  value  yc, 
meaning  that  its  output  signal  s  relates  to  the  incident  light  intensity 
L  as 

S  =  Llc .  (4.30) 


Light 


Camera 


Gamma 

correction 


ic 


*■ 


Corrected 

signal 

S'  «  L 


To  compensate  the  transfer  characteristic  of  this  camera  (i.e.,  to 
obtain  a  measurement  S'  that  is  proportional  to  the  original  light 
intensity  L),  the  camera  signal  S  is  subject  to  a  gamma  correction 
with  the  inverse  of  the  camera’s  gamma  value  yc  =  l/yc  and  thus 

S"  =  /-c(5)=S'1/7=.  (4.31) 


The  resulting  signal 

S'  =  Slh -  =  (L7=)1/7c  =  L(7ct}  =  L 1 

is  obviously  proportional  (in  theory  even  identical)  to  the  original 
light  intensity  L.  Although  this  example  is  quite  simplistic,  it  still 
demonstrates  the  general  rule,  which  holds  for  output  devices  as  well: 

7  European  Broadcast  Union  (EBU). 

8  International  Telecommunications  Union  (ITU). 


4.7  Gamma  Correction 


The  transfer  characteristic  of  an  input  or  output  device  with 
specified  gamma  value  7  is  compensated  for  by  a  gamma  cor¬ 
rection  with  7  =  I/7. 

In  the  aforementioned,  we  have  implicitly  assumed  that  all  values  are 
strictly  in  the  range  [0, 1],  which  usually  is  not  the  case  in  practice. 
When  working  with  digital  images,  we  have  to  deal  with  discrete  pixel 
values,  for  example,  in  the  range  [0,  255]  for  8-bit  images.  In  general, 
performing  a  gamma  correction 


^  ^  fgc  (*T  7 )  5 


on  a  pixel  value  a  E  [0,  amax 
following  three  steps: 


and  a  gamma  value  7  >  0  requires  the 


1.  Scale  a  linearly  to  a  E  [0,1]. 

2.  Apply  the  gamma  correction  function  to  a:  b  <—  a7. 

3.  Scale  b  E  [0,1]  linearly  back  to  b  E  [0,  amax  . 


Formulated  in  a  more  compact  way,  the  corrected  pixel  value  b  is 
obtained  from  the  original  value  a  as 


b  <r- 


a  \7 


a 


a 


max ' 


max 


(4.32) 


Figure  4.20  illustrates  the  typical  role  of  gamma  correction  in  the 
digital  work  flow  with  two  input  (camera,  scanner)  and  two  output 
devices  (monitor,  printer),  each  with  its  individual  gamma  value. 
The  central  idea  is  to  correct  all  images  to  be  processed  and  stored 
in  a  device-independent,  standardized  intensity  space. 


X 


_  _  ! 

Tc  1.3 

Storage 


~  3.0 


t 


Fig.  4.20 

Gamma  correction  in  the  digi¬ 
tal  imaging  work  flow.  Images 
are  processed  and  stored  in 
a  “linear”  intensity  space, 
where  gamma  correction  is 
used  to  compensate  for  the 
transfer  characteristic  of  each 
input  and  output  device.  (The 
gamma  values  shown  are  exam¬ 
ples  only.) 


4.7.5  Implementation 

Program  4.4  shows  the  implementation  of  gamma  correction  as  an 
Image J  plugin  for  8-bit  grayscale  images.  The  mapping  function 
/gc(a,7)  is  computed  as  a  lookup  table  (Fgc),  which  is  then  applied 
to  the  image  using  the  method  applyTableO  to  perform  the  actual 
point  operation  (see  also  Sec.  4.8.1). 
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Prog.  4.4 

Implementation  of  gamma  cor¬ 
rection  in  the  run()  method 
of  an  ImageJ  plugin.  The 
corrected  intensity  values  b 
are  only  computed  once  and 
stored  in  the  lookup  table 
Fgc  (line  15).  The  gamma 
value  GAMMA  is  constant.  The 
actual  point  operation  is  per¬ 
formed  by  calling  the  ImageJ 
method  applyTable (Fgc)  on 
the  image  object  ip  (line  18). 
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public  void  run(ImageProcessor  ip)  { 

//  works  for  8-bit  images  only 
int  K  =  256; 
int  aMax  =  K  -  1 ; 
double  GAMMA  =2.8; 

//  create  and  fill  the  lookup  table: 
int  []  Fgc  =  new  int  [K]  ; 

for  (int  a  =  0;  a  <  K;  a++)  { 

double  aa  =  (double)  a  /  aMax;  // scale  to  [0, 1] 

double  bb  =  Math.pow(aa,  GAMMA);  //  power  function 
//  scale  back  to  [0,  255]: 

int  b  =  (int)  Math . round(bb  *  aMax); 

Fgc  [a]  =  b; 

} 

ip .  applyTable  (Fgc) ;  //  modify  the  image 
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4.7.6  Modified  Gamma  Correction 


A  subtle  problem  with  the  simple  power  function  /7(a)  =  a7  (Eqn. 
(4.27))  appears  if  we  take  a  closer  look  at  the  slope  of  this  function, 
expressed  by  its  first  derivative, 


/((a)  =  7  '  ®(t  1), 
which  for  a  =  0  has  the  values 


0 


m  =  <  i 


oo 

\ 


for  7  >  1, 
for  7=1, 
for  7  <  1. 


(4.33) 


The  tangent  to  the  function  at  the  origin  is  thus  horizontal  (7  > 
1),  diagonal  (7  =  1),  or  vertical  (7  <  1),  with  no  intermediate  values. 
For  7  <  1,  this  causes  extremely  high  amplification  of  small  intensity 
values  and  thus  increased  noise  in  dark  image  regions.  Theoretically, 
this  also  means  that  the  power  function  is  generally  not  invertible  at 
the  origin. 

A  common  solution  to  this  problem  is  to  replace  the  lower  part 
(0  <  a  <  a0)  of  the  power  function  by  a  linear  segment  with  constant 
slope  and  to  continue  with  the  ordinary  power  function  for  a  >  a0. 
The  resulting  modified  gamma  correction  function, 


fj,a0  (a) 


s  •  a 


for  0  <  a  <  a0, 


with  s  = 


(1  +  d)  •  a1  —  d  for  a0  <  a  <  1, 

7  1 

and  d  = 


ao(7—  1)  +  % 


(i-7) 


ao  (7~ 1)  +  1 


(4.34) 


-  1  (4.35) 


thus  consists  of  a  linear  section  (for  0  <  a  <  a0)  and  a  nonlinear  sec¬ 
tion  (for  a0  <  a  <  1)  that  connect  smoothly  at  the  transition  point 


4.7  Gamma  Correction 


G,a0(a)  C,aQ(a) 


Fig.  4.21 

Modified  gamma  correction. 
The  mapping  f1Ci^(a)  consists 

of  a  linear  segment  with  fixed 
slope  s  between  a  =  0  and 
a  =  a0,  followed  by  a  power 
function  with  parameter  7 
(Eqn.  (4.34)).  The  dashed 
lines  show  the  ordinary  power 
functions  for  the  same  gamma 
values. 


a  =  a0.  The  linear  slope  8  and  the  parameter  d  are  determined  by 
the  requirement  that  the  two  function  segments  must  have  identical 
values  as  well  as  identical  slopes  (first  derivatives)  at  a  =  a0  to  pro¬ 
duce  a  continuous  function.  The  function  in  Eqn.  (4.34)  is  thus  fully 
specified  by  the  two  parameters  a0  and  7. 

Figure  4.21  shows  two  examples  of  the  modified  gamma  correction 
/7>a  ()  with  values  7  =  0.5  and  7  =  2.0,  respectively.  In  both  cases, 
the  transition  point  is  at  a0  =  0.2.  For  comparison,  the  figure  also 
shows  the  ordinary  gamma  correction  /  (a)  for  the  same  gamma 
values  (dashed  lines),  whose  slope  at  the  origin  is  00  (Fig.  4.21(a)) 
and  zero  (Fig.  4.21(b)),  respectively. 

Gamma  correction  in  common  standards 

The  modified  gamma  correction  is  part  of  several  modern  imaging 
standards.  In  practice,  however,  the  values  of  a0  are  considerably 
smaller  than  the  ones  used  for  the  illustrative  examples  in  Fig.  4.21, 
and  7  is  chosen  to  obtain  a  good  overall  match  to  the  desired  cor¬ 
rection  function.  For  example,  the  ITU-BT.709  specification  [122] 
mentioned  in  Sec.  4.7.3  specifies  the  parameters 

1 

7  =  - ~  0.45  and  a0  =  0.018  ,  (4.36) 

2.222 

with  the  corresponding  slope  and  offset  values  s  =  4.50681  and 
d  =  0.0991499,  respectively  (Eqn.  (4.35)).  The  resulting  correction 
function  /ixu(a)  has  a  nominal  gamma  value  of  0.45,  which  corre¬ 
sponds  to  the  effective  gamma  value  =  1/1.956  ~  0.511.  The 
gamma  correction  in  the  sRGB  standard  [224]  is  specified  on  the 
same  basis  (with  different  parameters;  see  Sec.  14.4). 

Figure  4.22  shows  the  actual  correction  functions  for  the  ITU  and 
sRGB  standards,  respectively,  each  in  comparison  with  the  equiv¬ 
alent  ordinary  gamma  correction.  The  ITU  function  (Fig.  4.22(a)) 
with  7  =  0.45  and  a0  =  0.018  corresponds  to  an  ordinary  gamma  cor¬ 
rection  with  effective  gamma  value  =  0.511  (dashed  line).  The 
curves  for  sRGB  (Fig.  4.22(b))  differ  only  by  the  parameters  7  and 
a0,  as  summarized  in  Table  4.1. 
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Fig.  4.22 

Gamma  correction  func¬ 
tions  specified  by  the  ITU-R 
BT.709  (a)  and  sRGB  (b) 
standards.  The  continu¬ 
ous  plot  shows  the  mod¬ 
ified  gamma  correction 
with  the  nominal  7  values 
and  transition  points  a0. 


f  ITU  (v)  /sRGB(a) 


Table  4.1 

Gamma  correction  pa¬ 
rameters  for  the  ITU  and 
sRGB  standards  based  on 
the  modified  mapping  in 
Eqns.  (4.34)  and  (4.35). 


Standard 

Nominal 
gamma  value 

7 

ao 

s 

d 

Effective 
gamma  value 

7eff 

ITU-R  BT.709 

1/2.222  «  0.450 

0.018 

4.50 

0.099 

1/1.956  «  0.511 

sRGB 

1/2.400  «  0.417 

0.0031308 

12.92 

0.055 

1/2.200  «  0.455 

Inverting  the  modified  gamma  correction 

To  invert  the  modified  gamma  correction  of  the  form  b  =  /7>a  (a) 

(Eqn.  (4.34)),  we  need  the  inverse  of  the  function  /7ja  (),  which  is 
again  defined  in  two  parts, 


f-1 

J  7,0-0 


b/s  for  0  <  b  <  5-a0, 

f°r  s'ao  <  b  <  1- 


s  and  d  are  the  quantities  defined  in  Eqn.  (4.35)  and  thus 

a  =  /y a0  (/7,a0  (a))  for  a  G  [0, 1] , 


(4.37) 


(4.38) 


with  the  same  value  7  being  used  in  both  functions.  The  inverse 
gamma  correction  function  is  required  in  particular  for  transforming 
between  different  color  spaces  if  nonlinear  (i.e.,  gamma-corrected) 
component  values  are  involved  (see  also  Sec.  14.2). 


4.8  Point  Operations  in  ImageJ 

Several  important  types  of  point  operations  are  already  implemented 
in  ImageJ,  so  there  is  no  need  to  program  every  operation  manually 
(as  shown  in  Prog.  4.4).  In  particular,  it  is  possible  in  ImageJ  to 
apply  point  operations  efficiently  by  using  tabulated  functions,  to 
use  built-in  standard  functions  for  point  operations  on  single  images, 
and  to  apply  arithmetic  operations  on  pairs  of  images.  These  issues 
are  described  briefly  in  the  remaining  parts  of  this  section. 

4.8.1  Point  Operations  with  Lookup  Tables 

Some  point  operations  require  complex  computations  for  each  pixel, 
and  the  processing  of  large  images  may  be  quite  time-consuming.  If 
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coordinates),  the  value  of  the  mapping  function  can  be  precomputed  Image J 

for  every  possible  pixel  value  and  stored  in  a  lookup  table,  which 

may  then  be  applied  very  efficiently  to  the  image.  A  lookup  table  L 

represents  a  discrete  mapping  (function  /)  from  the  original  to  the 

new  pixel  values, 


F  :  [0,K-1]  hA  [0,K-l] . 


(4.39) 


For  a  point  operation  specified  by  a  particular  pixel  mapping  function 
a'  =  /(a),  the  table  L  is  initialized  with  the  values 


F\a]  <—  f(a)<  for  0  <  a  <  K, 


(4.40) 


Thus  the  K  table  elements  of  F  need  only  be  computed  once,  where 
typically  K  =  256.  Performing  the  actual  point  operation  only  re¬ 
quires  a  simple  (and  quick)  table  lookup  in  F  at  each  pixel,  that 

is, 


I\u,  v)  F[/(r,  v)]  , 


(4.41) 


which  is  much  more  efficient  than  any  individual  function  call.  Im- 
ageJ  provides  the  method 


void  applyTable (int []  F) 

for  objects  of  type  ImageProcessor,  which  requires  a  lookup  table 
F  as  a  ID  int  array  of  size  K  (see  Prog.  4.4  on  page  80  for  an 
example).  The  advantage  of  this  approach  is  obvious:  for  an  8-bit 
image,  for  example,  the  mapping  function  is  evaluated  only  256  times 
(independent  of  the  image  size)  and  not  a  million  times  or  more  as  in 
the  case  of  a  large  image.  The  use  of  lookup  tables  for  implementing 
point  operations  thus  always  makes  sense  if  the  number  of  image 
pixels  (M  x  N)  is  greater  than  the  number  of  possible  pixel  values 
K  (which  is  usually  the  case). 


4.8.2  Arithmetic  Operations 

ImageJ  implements  a  set  of  common  arithmetic  operations  as  meth¬ 
ods  for  the  class  ImageProcessor,  which  are  summarized  in  Table 
4.2.  In  the  following  example,  the  image  is  multiplied  by  a  scalar 
constant  (1.5)  to  increase  its  contrast: 

ImageProcessor  ip  =  ...  //some  image 
ip . multiply (1.5) ; 

The  image  ip  is  destructively  modified  by  all  of  these  methods,  with 
the  results  being  limited  (clamped)  to  the  minimum  and  maximum 
pixel  values,  respectively. 


4.8.3  Point  Operations  Involving  Multiple  Images 

Point  operations  may  involve  more  than  one  image  at  once,  with 
arithmetic  operations  on  the  pixels  of  pairs  of  images  being  a  special 
but  important  case.  For  example,  we  can  express  the  pointwise  addi¬ 
tion  of  two  images  lx  and  I2  (of  identical  size)  to  create  a  new  image 
I'  as 
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Table  4.2 

ImageJ  methods  for  arithmetic 
operations  applicable  to  ob¬ 
jects  of  type  ImageProcessor. 
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void  abs() 
void  add(int  p) 
void  gamma (double  g ) 
void  invert (int  p) 
void  log() 
void  max (double  s) 
void  min (double  s) 
void  multiply (double  s) 
void  sqr() 
void  sqrtO 


I(u,  v )  i—  | I(u,  u)| 

I(u,v )  I(u,v)  +p 

7(u,  v)  <—  (l{u,v)/255)9  •  255 

7 (r,  v )  255  —  7 (r,  v ) 

I(u >  v)  t-  log10  (7(w,u)) 
I(u,v)  max(/(w,v),s) 

7(u,  t)  min(7(u,  u),  s) 
7(r,t)  round  •  s) 

7(u,  t)  7(u,  u)2 
I(u,V )  <<—  yjl(u,  V ) 


I'(u,v )  71(r,  p)  +  72(r,  p)  (4.42) 

for  all  positions  (r,  p).  In  general,  any  function  /(a1?  a2, . . . ,  an)  over 
n  pixel  values  ai  may  be  defined  to  perform  pointwise  combinations 
of  n  images,  that  is, 

I'(u,v)  e-  f(li(u,v),  I2(u,v), . . . ,  In(u,  v)).  (4.43) 

Of  course,  most  arithmetic  operations  on  multiple  images  can  also 
be  implemented  as  successive  binary  operations  on  pairs  of  images. 

4.8.4  Methods  for  Point  Operations  on  Two  Images 

ImageJ  supplies  a  single  method  for  implementing  arithmetic  opera¬ 
tions  on  pairs  of  images, 

copyBits (ImageProcessor  ip2,  int  u,  int  v,  int  mode ), 

which  applies  the  binary  operation  specified  by  the  transfer  mode 
parameter  mode  to  all  pixel  pairs  taken  from  the  source  image  ip2 
and  the  target  image  (the  image  on  which  this  method  is  invoked) 
and  stores  the  result  in  the  target  image.  u,  v  are  the  coordinates 
where  the  source  image  is  inserted  into  the  target  image  (usually 
u  =  v  =  0).  The  following  code  segment  demonstrates  the  addition 
of  two  images: 

ImageProcessor  ipl  =  ...  //  target  image  (7X) 

ImageProcessor  ip2  =  ...  //  source  image  (72) 

•  •  • 

ipl .  copyBits  (ip2 ,  0,  0,  Blitter  .  ADD) ;  //  7X  7X  +  72 
//  ipl  holds  the  result,  ip2  is  unchanged 


In  this  operation,  the  target  image  ipl  is  destructively  modified, 
while  the  source  image  ip2  remains  unchanged.  The  constant  ADD 
is  one  of  several  arithmetic  transfer  modes  defined  by  the  Blitter 
interface  (see  Table  4.3).  In  addition,  Blitter  defines  (bitwise)  log¬ 
ical  operations,  such  as  OR  and  AND.  For  arithmetic  operations,  the 
copyBits  ()  method  limits  the  results  to  the  admissible  range  of  pixel 
values  (of  the  target  image).  Also  note  that  (except  for  target  images 
of  type  FloatProcessor)  the  results  are  not  rounded  but  truncated 
to  integer  values. 
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4.8  Point  Operations  in 
ImageJ 


Table  4.3 

Arithmetic  operations  and 
corresponding  transfer  mode 
constants  for  ImageProcessor’s 
copyBitsO  method.  Example: 
ipl . copyBits (ip2 ,  0,  0, 
Blitter . ADD) . 


4.8.5  ImageJ  Plugins  Involving  Multiple  Images 

ImageJ  provides  two  types  of  plugin:  a  generic  plugin  (Plugin), 
which  can  be  run  without  any  open  image,  and  plugins  of  type 
PluglnFilter,  which  apply  to  a  single  image.  In  the  latter  case,  the 
currently  active  image  is  passed  as  an  object  of  type  ImageProcessor 
(or  any  of  its  subclasses)  to  the  plugin’s  run()  method  (see  also  Sec. 
2.2.3). 

If  two  or  more  images  7X,  72, . . . ,  1^  are  to  be  combined  by  a  plugin 
program,  only  a  single  image  7X  can  be  passed  directly  to  the  plugin’s 
run()  method,  but  not  the  additional  images  72, . . .  ,  7fe.  The  usual 
solution  is  to  make  the  plugin  open  a  dialog  window  to  let  the  user 
select  the  remaining  images  interactively.  This  is  demonstrated  in 
the  following  example  plugin  for  transparently  blending  two  images. 

Example:  Linear  blending 

Linear  blending  is  a  simple  method  for  continuously  mixing  two  im¬ 
ages,  7BG  and  7FG.  The  background  image  7BG  is  covered  by  the 
foreground  image  IFG ,  whose  transparency  is  controlled  by  the  value 
a  in  the  form 

I'(u,v)=a-IBG(u,v)  +  (1  -a)  ■  IFG(u,v) ,  (4.44) 

with  0  <  a  <  1.  For  a  =  0,  the  foreground  image  IFG  is  nontrans¬ 
parent  (opaque)  and  thus  entirely  hides  the  background  image  7BG. 
Conversely,  the  image  IFG  is  fully  transparent  for  a  =  1  and  only 
7BG  is  visible.  All  a  values  between  0  and  1  result  in  a  weighted 
sum  of  the  corresponding  pixel  values  taken  from  7BG  and  7FG  (Eqn. 
(4.44)). 

Figure  4.23  shows  the  results  of  linear  blending  for  different  a 
values.  The  Java  code  for  the  corresponding  implementation  (as  an 
ImageJ  plugin)  is  listed  in  Prog.  4.5.  The  background  image  (bglp) 
is  passed  directly  to  the  plugin’s  run()  method.  The  second  (fore¬ 
ground)  image  and  the  a  value  are  specified  interactively  by  creating 
an  instance  of  the  ImageJ  class  GenericDialog,  which  allows  the 
simple  implementation  of  dialog  windows  with  various  types  of  input 
fields. 
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Fig.  4.23 

Linear  blending  example. 
Foreground  image  IFG  (a) 
and  background  image  (/B G) 
(e);  blended  images  for  trans¬ 
parency  values  a  =  0.25,  0.50, 
and  0.75  (b— d)  and  dialog 
window  (f)  produced  by 
GenericDialog  (see  Prog.  4.5). 
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4.9  Exercises 

Exercise  4.1.  Implement  the  auto-contrast  operation  as  defined  in 
Eqns.  (4.9)-(4.11)  as  an  ImageJ  plugin  for  an  8-bit  grayscale  image. 
Set  the  quantile  p  of  pixels  to  be  saturated  at  both  ends  of  the  in¬ 
tensity  range  (0  and  255)  to  p  =  plo  =  phi  =  1%. 


Exercise  4.2.  Modify  the  histogram  equalization  plugin  in  Prog.  4.2 
to  use  a  lookup  table  (Sec.  4.8.1)  for  computing  the  point  operation. 


Exercise  4.3.  Implement  the  histogram  equalization  as  defined  in  4  9  Exercises 
Eqn.  (4.12),  but  use  the  modified  cumulative  histogram  defined  in 
Eqn.  (4.13),  cumulating  the  square  root  of  the  histogram  entries. 

Compare  the  results  to  the  standard  (linear)  approach  by  plotting 
the  resulting  histograms  and  cumulative  histograms  as  shown  in  Fig. 

4.10. 

Exercise  4.4.  Show  formally  that  (a)  a  linear  histogram  equaliza¬ 
tion  (Eqn.  (4.12))  does  not  change  an  image  that  already  has  a  uni¬ 
form  intensity  distribution  and  (b)  that  any  repeated  application  of 
histogram  equalization  to  the  same  image  causes  no  more  changes. 

Exercise  4.5.  Show  that  the  linear  histogram  equalization  (Sec.  4.5) 
is  only  a  special  case  of  histogram  specification  (Sec.  4.6). 

Exercise  4.6.  Implement  the  histogram  specification  using  a  piece- 
wise  linear  reference  distribution  function,  as  described  in  Sec.  4.6.3. 

Define  a  new  object  class  with  all  necessary  instance  variables  to  rep¬ 
resent  the  distribution  function  and  implement  the  required  functions 
P L(i)  (Eqn.  (4.23))  and  Pf1^)  (Eqn.  (4.24))  as  methods  of  this  class. 

Exercise  4.7.  Using  a  histogram  specification  for  adjusting  multiple 
images  (Sec.  4.6.4),  one  could  either  use  one  typical  image  as  the 
reference  or  compute  an  “average”  reference  histogram  from  a  set 
of  images.  Implement  the  second  approach  and  discuss  its  possible 
advantages  (or  disadvantages). 

Exercise  4.8.  Implement  the  modified  gamma  correction  (see  Eqn. 

(4.34))  as  an  Image J  plugin  with  variable  values  for  7  and  a0  using 
a  lookup  table  as  shown  in  Prog.  4.4. 

Exercise  4.9.  Show  that  the  modified  gamma  correction  function 
/7?ao(a),  with  the  parameters  defined  in  Eqns.  (4.34)-(4.35),  is  Cl- 
continuous  (i.e.,  both  the  function  itself  and  its  first  derivative  are 
continuous). 
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Prog.  4.5 

Image  J-Plugin 
(Linear_Blending).  A  back¬ 
ground  image  is  transparently 
blended  with  a  selected  fore¬ 
ground  image.  The  plugin  is 
applied  to  the  (currently  ac¬ 
tive)  background  image,  and 
the  foreground  image  must 
also  be  open  when  the  plugin 
is  started.  The  background 
image  (bglp),  which  is  passed 
to  the  plugin’s  run()  method, 
is  multiplied  with  a  (line  22). 
The  foreground  image  (fglP, 
selected  in  part  2)  is  first  du¬ 
plicated  (line  20)  and  then 
multiplied  with  (1  —  a)  (line 
21).  Thus  the  original  fore¬ 
ground  image  is  not  modified. 
The  final  result  is  obtained 
by  adding  the  two  weighted 
images  (line  23).  To  select 
the  foreground  image,  a  list 
of  currently  open  images  and 
image  titles  is  obtained  (lines 
30—32).  Then  a  dialog  object 
(of  type  GenericDialog)  is  cre¬ 
ated  and  opened  for  specifying 
the  foreground  image  (fglm) 
and  the  a.  value  (lines  36—46). 


1  import  i j . ImagePlus ; 

2  import  ij . gui . GenericDialog ; 

3  import  ij . plugin . filter . PluglnFilter ; 

4  import  ij . process . Blitter ; 

5  import  ij . process . ImageProcessor ; 

6  import  imagingbook . lib . ij . I jUtils ; 

7 


8  public  class  Linear_Blending  implements  PluglnFilter  { 


9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 

47 

48 

49 

50  } 


static  double  alpha  =  0.5; 
ImagePlus  fglm; 


//  transparency  of  foreground  image 
//  foreground  image  (to  be  selected) 


public  int  setup (String  arg,  ImagePlus  im)  { 
return  D0ES_8G; 

} 

public  void  run  (ImageProcessor  ipBG)  {  // ipBG  = /BG 

if  (runDialogO )  { 

ImageProcessor  ipFG  =  //  ipFG  =  IFG 

f glm . getProcessor () . convertToByte (false) ; 
ipFG  =  ipFG . duplicate () ; 

ipFG. mult iply(l  -  alpha)  ;  //  /FG  /FG  •  (1  —  a) 

ipBG.  multiply  (alpha)  ;  //  IBG  IBG  •  a 

ipBG.  copyBits  (ipFG,  0,0,  Blitter  .ADD)  ;  //  IBG  /BG  +  /FG 


} 


} 


boolean  runDialogO  { 

//  get  list  of  open  images  and  their  titles: 

ImagePlus  []  openlmages  =  I jUtils . getOpenlmages (true) ; 
String  []  imageTitles  =  new  String [openlmages . length] ; 
for  (int  i  =  0;  i  <  openlmages . length ;  i++)  { 
imageTitles  [i]  =  openlmages [i] . getShortTitle () ; 

} 

//  create  the  dialog  and  show: 

GenericDialog  gd  = 

new  GenericDialog ("Linear  Blending"); 
gd . addChoice ( "Foreground  image : " , 
imageTitles,  imageTitles  [0] ) ; 
gd . addNumericField ( "Alpha  value  [0..1]:",  alpha,  2); 
gd . showDialog ( ) ; 

if  (gd.  wasCanceledO ) 
return  false; 
else  { 

fglm  =  openlmages [gd. getNextChoicelndex ()] ; 
alpha  =  gd.getNextNumber () ; 
return  true ; 

} 

} 
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Filters 


The  essential  property  of  point  operations  (discussed  in  the  previous 
chapter)  is  that  each  new  pixel  value  only  depends  on  the  original 
pixel  at  the  same  position.  The  capabilities  of  point  operations  are 
limited,  however.  For  example,  they  cannot  accomplish  the  task  of 
sharpening  or  smoothing  an  image  (Fig.  5.1).  This  is  what  filters 
can  do.  They  are  similar  to  point  operations  in  the  sense  that  they 
also  produce  a  1:1  mapping  of  the  image  coordinates,  that  is,  the 
geometry  of  the  image  does  not  change. 


Fig.  5.1 

No  point  operation  can  blur  or 
sharpen  an  image.  This  is  an 
example  of  what  filters  can  do. 
Like  point  operations,  filters 
do  not  modify  the  geometry  of 
an  image. 


5.1  What  is  a  Filter? 

The  main  difference  between  filters  and  point  operations  is  that  filters 
generally  use  more  than  one  pixel  from  the  source  image  for  comput¬ 
ing  each  new  pixel  value.  Let  us  first  take  a  closer  look  at  the  task 
of  smoothing  an  image.  Images  look  sharp  primarily  at  places  where 
the  local  intensity  rises  or  drops  sharply  (i.e. ,  where  the  difference 
between  neighboring  pixels  is  large).  On  the  other  hand,  we  perceive 
an  image  as  blurred  or  fuzzy  where  the  local  intensity  function  is 
smooth. 

A  first  idea  for  smoothing  an  image  could  thus  be  to  simply  re¬ 
place  every  pixel  by  the  average  of  its  neighboring  pixels.  To  deter¬ 
mine  the  new  pixel  value  in  the  smoothed  image  I'(u ,  v ),  we  use  the 
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original  pixel  I{u,v)  =  p0  at  the  same  position  plus  its  eight  neigh¬ 
boring  pixels  Pi,P2>  •  •  •  >P8  compute  the  arithmetic  mean  of  these 
nine  values, 


I  (r,  L?)  i — 


Po  +  Pi  +  P2  +  P3  +  Pa  +  Pb  +  P6  +  P7  +  Ps 

9 


(5.1) 


Expressed  in  relative  image  coordinates  this  is 


I'(u,v)  ^  [  I(u  — 1,^  —  1)  +  I(u,v  —  1)  +  I(u  +  l,v  —  l)  + 

I (ll — 1,L?)  +  I(u,v)  “I-  1,r)  H- 

I{u  — 1,  v  T  1)  +  I(u:v-\- 1)  +  I{u-\-\^v ]  , 

(5.2) 


which  we  can  write  more  compactly  in  the  form 


I'(u,v)  <(— 


hE  E  J(u  +  i,v  +  j) . 
j=~  1  ?'=-! 


This  simple  local  averaging  already  exhibits  all  the  important 
elements  of  a  typical  filter.  In  particular,  it  is  a  so-called  linear  filter, 
which  is  a  very  important  class  of  filters.  But  how  are  filters  defined  in 
general?  First  they  differ  from  point  operations  mainly  by  using  not 
a  single  source  pixel  but  a  set  of  them  for  computing  each  resulting 
pixel.  The  coordinates  of  the  source  pixels  are  fixed  relative  to  the 
current  image  position  (r,  v)  and  usually  form  a  contiguous  region, 
as  illustrated  in  Fig.  5.2. 


Fig.  5.2 

Principal  filter  operation.  Each 
new  pixel  value  I'  (u,  v )  is  cal¬ 
culated  as  a  function  of  the 
pixel  values  within  a  speci¬ 
fied  region  of  source  pixels 
Ru  v  in  the  original  image  I. 


u  u 


I 


R 


U  ,  V 


I'(u ,  v) 


The  size  of  the  filter  region  is  an  important  parameter  of  the 
filter  because  it  specifies  how  many  original  pixels  contribute  to  each 
resulting  pixel  value  and  thus  determines  the  spatial  extent  (support) 
of  the  filter.  For  example,  the  smoothing  filter  in  Eqn.  (5.2)  uses  a 
3x3  region  of  support  that  is  centered  at  the  current  coordinate 
(r,  v).  Similar  filters  with  larger  support,  such  as  5  x  5,  7  x  7,  or  even 
21  x  21  pixels,  would  obviously  have  stronger  smoothing  effects. 

The  shape  of  the  filter  region  is  not  necessarily  quadratic  or  even 
rectangular.  In  fact,  a  circular  (disk-shaped)  region  would  be  pre¬ 
ferred  to  obtain  an  isotropic  blur  effect  (i.e.,  one  that  is  the  same  in 
all  image  directions).  Another  option  is  to  assign  different  weights  to 
the  pixels  in  the  support  region,  such  as  to  give  stronger  emphasis  to 
pixels  that  are  closer  to  the  center  of  the  region.  Furthermore,  the 
support  region  of  a  filter  does  not  need  to  be  contiguous  and  may 
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not  even  contain  the  original  pixel  itself  (imagine  a  ring-shaped  filter  5  2  Linear  Filters 
region,  for  example).  Theoretically  the  filter  region  could  even  be  of 
infinite  size. 

It  is  probably  confusing  to  have  so  many  options — a  more  sys¬ 
tematic  method  is  needed  for  specifying  and  applying  filters  in  a 
targeted  manner.  The  traditional  and  proven  classification  into  lin¬ 
ear  and  nonlinear  filters  is  based  on  the  mathematical  properties  of 
the  filter  function;  that  is,  whether  the  result  is  computed  from  the 
source  pixels  by  a  linear  or  a  nonlinear  expression.  In  the  following, 
we  discuss  both  classes  of  filters  and  show  several  practical  examples. 


5.2  Linear  Filters 

Linear  Liters  are  denoted  that  way  because  they  combine  the  pixel 
values  in  the  support  region  in  a  linear  fashion,  that  is,  as  a  weighted 
summation.  The  local  averaging  process  discussed  in  the  beginning 
(Eqn.  (5.3))  is  a  special  example,  where  all  nine  pixels  in  the  3  x 
3  support  region  are  added  with  identical  weights  (Yg).  With  the 
same  mechanism,  a  multitude  of  filters  with  different  properties  can 
be  defined  by  simply  modifying  the  distribution  of  the  individual 
weights. 


5.2.1  The  Filter  Kernel 


For  any  linear  filter,  the  size  and  shape  of  the  support  region,  as  well 
as  the  individual  pixel  weights,  are  specified  by  the  “filter  kernel”  or 
“filter  matrix”  H(i,j).  The  size  of  the  kernel  H  equals  the  size  of 
the  filter  region,  and  every  element  H(i,j)  specifies  the  weight  of  the 
corresponding  pixel  in  the  summation.  For  the  3x3  smoothing  filter 
in  Eqn.  (5.3),  the  filter  kernel  is 


T— 1 

t-H 

1 _ 

1 

'ill' 

H  = 

V9  V9  V9 
V9  V9  V9 

=  9' 

l  l  l 

l  l  l 

because  each  of  the  nine  pixels  contributes  one-ninth  of  its  value  to 
the  result. 

In  principle,  the  filter  kernel  H(i,j)  is,  just  like  the  image  itself,  a 
discrete,  2D,  real- valued  function,  H:  Z  x  Z  4  M.  The  filter  has  its 
own  coordinate  system  with  the  origin — often  referred  to  as  the  “hot 
spot” —  mostly  (but  not  necessarily)  located  at  the  center.  Thus, 
filter  coordinates  are  generally  positive  and  negative  (Fig.  5.3).  The 
filter  function  is  of  infinite  extent  and  considered  zero  outside  the 
region  defined  by  the  matrix  H . 


5.2.2  Applying  the  Filter 

For  a  linear  filter,  the  result  is  unambiguously  and  completely  speci¬ 
fied  by  the  coefficients  of  the  filter  matrix.  Applying  the  filter  to  an 
image  is  a  simple  process  that  is  illustrated  in  Fig.  5.4.  The  following 
steps  are  performed  at  each  image  position  (u,v): 
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(0,  0)  =  Hot  Spot 


Fig.  5.3 

Filter  matrix  and  its  coor¬ 
dinate  system,  i  is  the  hor¬ 
izontal  (column)  index,  j 
is  the  vertical  (row)  index. 


Fig.  5.4 

Linear  filter  operation.  The 
filter  kernel  H  is  placed  with 
its  origin  at  position  (u,  v) 
on  the  image  I .  Each  filter 
coefficient  is  multi¬ 

plied  with  the  corresponding 
image  pixel  I{u  +  i,  v 
the  results  are  added,  and 
the  final  sum  is  inserted  as 
the  new  pixel  value  I'  (u,  v). 


1.  The  filter  kernel  H  is  moved  over  the  original  image  I  such  that 
its  origin  H( 0,  0)  coincides  with  the  current  image  position  (r,  v). 

2.  All  filter  coefficients  H(i,j)  are  multiplied  with  the  corresponding 

image  element  and  the  results  are  added  up. 

3.  Finally,  the  resulting  sum  is  stored  at  the  current  position  in  the 
new  image  I'(u,v). 

Described  formally,  the  pixel  values  of  the  new  image  I'(u,v)  are 
computed  by  the  operation 

r(u,v)<r-  ^2  I(u  +  i,v  +  j)  (5.5) 

(i,j)£RH 


where  RH  denotes  the  set  of  coordinates  covered  by  the  filter  H.  For 
a  typical  3x3  filter  with  centered  origin,  this  is 

i= i  .7  =  1 

I\u,v)^^2  I(u  +  i,v  +  j)-H(i,j),  (5.6) 

i=-l  j=- 1 

for  all  image  coordinates  (u,v).  Not  quite  for  all  coordinates,  to 
be  exact.  There  is  an  obvious  problem  at  the  image  borders  where 
the  filter  reaches  outside  the  image  and  finds  no  corresponding  pixel 
values  to  use  in  computing  a  result.  For  the  moment,  we  ignore  this 
border  problem,  but  we  will  attend  to  it  again  in  Sec.  5.5.2. 
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5.2.3  Implementing  the  Filter  Operation 


5.2  Linear  Filters 


Now  that  we  understand  the  principal  operation  of  a  filter  (Fig.  5.4) 
and  know  that  the  borders  need  special  attention,  we  go  ahead  and 
program  a  simple  linear  filter  in  Image J.  But  before  we  do  this,  we 
may  want  to  consider  one  more  detail.  In  a  point  operation  (e.g., 
in  Progs.  4.1  and  4.2),  each  new  pixel  value  depends  only  on  the 
corresponding  pixel  value  in  the  original  image,  and  it  was  thus  no 
problem  simply  to  store  the  results  back  to  the  same  image — the 
computation  is  done  “in  place”  without  the  need  for  any  intermediate 
storage.  In-place  computation  is  generally  not  possible  for  a  filter 
since  any  original  pixel  contributes  to  more  than  one  resulting  pixel 
and  thus  may  not  be  modified  before  all  operations  are  complete. 

We  therefore  require  additional  storage  space  for  the  resulting 
image,  which  subsequently  could  be  copied  back  to  the  source  im¬ 
age  again  (if  desired).  Thus  the  complete  filter  operation  can  be 
implemented  in  two  different  ways  (Fig.  5.5): 

A.  The  result  of  the  filter  computation  is  initially  stored  in  a  new 
image  whose  content  is  eventually  copied  back  to  the  original 
image. 

B.  The  original  image  is  first  copied  to  an  intermediate  image  that 
serves  as  the  source  for  the  actual  filter  operation.  The  result 
replaces  the  pixels  in  the  original  image. 

The  same  amount  of  storage  is  required  for  both  versions,  and  thus 
none  of  them  offers  a  particular  advantage.  In  the  following  examples, 
we  generally  use  version  B. 


Original 

image 


Inter¬ 

mediate 

image 


Original 

image 


Inter¬ 

mediate 

image 


Fig.  5.5 

Practical  implementation  of 
in-place  filter  operations. 
Version  A:  The  result  of  the 
filter  operation  is  first  stored 
in  an  intermediate  image  and 
subsequently  copied  back  to 
the  original  image  (a). 

Version  B:  The  original  image 
is  first  copied  to  an  interme¬ 
diate  image  that  serves  as  the 
source  for  the  filter  operation. 
The  results  are  placed  in  the 
original  image  (b). 


(a)  Version  A 


(b)  Version  B 


5.2.4  Filter  Plugin  Examples 

The  following  examples  demonstrate  the  implementation  of  two  very 
basic  filters  that  are  nevertheless  often  used  in  practice. 

Simple  3x3  averaging  filter  (“box”  filter) 

Program  5.1  shows  the  Image  J  code  for  a  simple  3x3  smoothing 
filter  based  on  local  averaging  (Eqn.  (5.4)),  which  is  often  called  a 
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Prog.  5.1 

3x3  averaging  “box”  filter 
(Filter_Box_3x3).  First  (in 
line  10)  a  duplicate  (copy)  of 
the  original  image  (orig)  is 
created,  which  is  used  as  the 
source  image  in  the  subsequent 
filter  computation  (line  18). 
In  line  23,  the  resulting  value 
is  placed  in  the  original  image 
(line  23).  Notice  that  the  bor¬ 
der  pixels  remain  unchanged 
because  they  are  not  reached 
by  the  iteration  over  (u,  v). 


1  import  i j . ImagePlus ; 

2  import  ij . plugin . filter . PluglnFilter ; 

3  import  ij . process . ImageProcessor ; 

4 


5  public  class  Filter_Box_3x3  implements  PluglnFilter  { 
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26 

27  } 


public  void  run (ImageProcessor  ip)  { 
int  M  =  ip .  getWidthO  ; 
int  N  =  ip . get Height () ; 

ImageProcessor  copy  =  ip . duplicate () ; 


} 


for  (int  u=  1;  u<=M-  2;  u++)  { 
for  (int  v=l;v<=N-2;  v++)  { 
//compute  filter  result  for  position  (u,  v): 
int  sum  =  0; 

for  (int  i  =  -1;  i  <=  1 ;  i++)  { 
for  (int  j  =  -1;  j  <=  1;  j++)  { 
int  p  =  copy . getPixel (u  +  i, 
sum  =  sum  +  p; 

} 

} 

int  q  =  (int)  (sum  /  9.0); 
ip .putPixel (u,  v,  q) ; 


} 


v  +  j); 


“box”  filter  because  of  its  box-like  shape.  No  explicit  filter  matrix 
is  required  in  this  case,  since  all  filter  coefficients  are  identical  (Yg). 
Also,  no  clamping  (see  Sec.  4.1.2)  of  the  results  is  needed  because  the 
sum  of  the  filter  coefficients  is  1  and  thus  no  pixel  values  outside  the 
admissible  range  can  be  created. 

Although  this  example  implements  an  extremely  simple  filter,  it 
nevertheless  demonstrates  the  general  structure  of  a  2D  filter  pro¬ 
gram.  In  particular,  four  nested  loops  are  needed:  two  (outer)  loops 
for  moving  the  filter  over  the  image  coordinates  (u,  v )  and  two  (in¬ 
ner)  loops  to  iterate  over  the  (i,  j)  coordinates  within  the  rectangular 
filter  region.  The  required  amount  of  computation  thus  depends  not 
only  upon  the  size  of  the  image  but  equally  on  the  size  of  the  filter. 


Another  3x3  smoothing  filter 

Instead  of  the  constant  weights  applied  in  the  previous  example,  we 
now  use  a  real  filter  matrix  with  variable  coefficients.  For  this  pur¬ 
pose,  we  apply  a  bell-shaped  3x3  filter  function  which  puts 

more  emphasis  on  the  center  pixel  than  the  surrounding  pixels: 


H 


0.075  0.125  0.075" 
0.125  0.200  0.125 
0.075  0.125  0.075 


(5.7) 
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Notice  that  all  coefficients  in  H  are  positive  and  sum  to  1  (i.e.,  the 
matrix  is  normalized)  such  that  all  results  remain  within  the  origi- 
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public  void  run(ImageProcessor  ip)  { 
int  M  =  ip  .getWidthO  ; 
int  N  =  ip . getHeight () ; 

/1 3x3  filter  matrix: 
double  []  []  H  =  { 

{0.075,  0.125,  0.075}, 

{0.125,  0.200,  0.125}, 

{0.075,  0.125,  0.075}}; 

ImageProcessor  copy  =  ip . duplicate () ; 

for  (int  u=l;u<=M-2;  u++)  { 
for  (int  v=l;v<=N-2;  v++)  { 

//  compute  filter  result  for  position  (u,v): 
double  sum  =  0; 

for  (int  i  =  -1;  i  <=  1;  i++)  { 
for  (int  j  =  -1;  j  <=  1;  j++)  { 

int  p  =  copy . getPixel (u  +  i,  v  +  j); 
//  get  the  corresponding  filter  coefficient: 
double  c  =  H[j  +  1]  [i  +  1]  ; 
sum  =  sum  +  c  *  p; 

} 

} 

int  q  =  (int)  Math . round (sum) ; 
ip .putPixel (u,  v,  q) ; 

} 

} 

} 


Prog.  5.2 

3x3  smoothing  filter 
(Filter_Smooth_3x3) .  The  filter 
matrix  is  defined  as  a  2D  array 
of  type  double  (line  7).  The 
coordinate  origin  of  the  filter 
is  assumed  to  be  at  the  cen¬ 
ter  of  the  matrix  (i.e.,  at  the 
array  position  [1,  1]),  which  is 
accounted  for  by  an  offset  of  1 
for  the  i,  j  coordinates  in  line 
22.  The  results  are  rounded 
(line  26)  and  stored  in  the 
original  image  (line  27). 


nal  range  of  pixel  values.  Again  no  clamping  is  necessary  and  the 
program  structure  in  Prog.  5.2  is  virtually  identical  to  the  previous 
example.  The  filter  matrix  (filter)  is  represented  by  a  2D  array1 
of  type  double.  Each  pixel  is  multiplied  by  the  corresponding  coeffi¬ 
cient  of  the  filter  matrix,  the  resulting  sum  being  also  of  type  double. 
Accessing  the  filter  coefficients,  it  must  be  considered  that  the  coor¬ 
dinate  origin  of  the  filter  matrix  is  assumed  to  be  at  its  center  (i.e., 
at  position  (1,1))  in  the  case  of  a  3  x  3  matrix.  This  explains  the 
offset  of  1  for  the  i  and  j  coordinates  (see  Prog.  5.2,  line  22). 

5.2.5  Integer  Coefficients 

Instead  of  using  floating-point  coefficients  (as  in  the  previous  ex¬ 
amples),  it  is  often  simpler  and  usually  more  efficient  to  work  with 
integer  coefficients  in  combination  with  some  common  scale  factor  s, 
that  is, 

=  (5.8) 

with  G  Z  and  s  G  R.  If  all  filter  coefficients  are  positive 

(which  is  the  case  for  any  smoothing  filter),  then  s  is  usually  taken 

1  See  the  additional  comments  regarding  2D  arrays  in  Java  in  Sec.  F.2.4 
in  the  Appendix. 
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Fig.  5.6 

Adobe  Photoshop’s  “Custom 
Filter”  implements  linear  fil¬ 
ters  up  to  a  size  of  5  X  5. 

The  filter’s  coordinate  ori¬ 
gin  (“hot  spot”)  is  assumed  to 
be  at  the  center  (value  set  to 
3  in  this  example),  and  empty 
cells  correspond  to  zero  co¬ 
efficients.  In  addition  to  the 
(integer)  coefficients,  common 
Scale  and  Offset  values  can 
be  specified  (see  Eqn.  (5.11)). 
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as  the  reciprocal  of  the  sum  of  the  coefficients, 

1 

s~y~hoJy 

to  obtain  a  normalized  filter  matrix.  In  this  case,  the  results  are 
bounded  to  the  original  range  of  pixel  values.  For  example,  the  filter 
matrix  in  Eqn.  (5.7)  could  be  defined  equivalently  as 


"0.075  0.125  0.075" 

1 

~  40 

"3  5  3" 

H  = 

0.125  0.200  0.125 

0.075  0.125  0.075 

5  8  5 

3  5  3 

(5.10) 


with  the  common  scale  factor  s  =  ^  =  0.025.  A  similar  scaling  is 
used  for  the  filter  operation  in  Prog.  5.3. 

In  Adobe  Photoshop,  linear  filters  can  be  specified  with  the  “Cus¬ 
tom  Filter”  tool  (Fig.  5.6)  using  integer  coefficients  and  a  common 
scale  factor  Scale  (which  corresponds  to  the  reciprocal  of  s ).  In  ad¬ 
dition,  a  constant  Offset  value  can  be  specified;  for  example,  to  shift 
negative  results  (caused  by  negative  coefficients)  into  the  visible  range 
of  values.  In  summary,  the  operation  performed  by  the  5x5  Photo¬ 
shop  custom  filter  can  be  expressed  as 


l'(u,v)  <—  Offset  + 


1 


Scale 


3  = 2  i—2 

■E  E  I{u+i,v+j)  ■  (5.11) 

j—  2  i—  2 


5.2.6  Filters  of  Arbitrary  Size 

Small  filters  of  size  3x3  are  frequently  used  in  practice,  but  sometimes 
much  larger  filters  are  required.  Let  us  assume  that  the  filter  matrix 
H  is  centered  and  has  an  odd  number  of  (2iF+l)  columns  and  (2L+1) 
rows,  with  iF,  L  >  0.  If  the  image  is  of  size  M  x  A,  that  is 

I(u,v)  with  0<u<M  and  0  <  v  <  TV,  (5.12) 

then  the  result  of  the  filter  can  be  calculated  for  all  image  coordinates 
(id,  v')  with 

K  <u  <  (M  —  K—l)  and  L<v'<(N-L- 1),  (5.13) 

as  illustrated  in  Fig.  5.7.  Program  5.3  (which  is  adapted  from  Prog. 
5.2)  shows  a  7  x  5  smoothing  filter  as  an  example  for  implementing 
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public  void  run(ImageProcessor  ip)  { 
int  M  =  ip .  getWidthO  ; 
int  N  =  ip . get Height () ; 

//  filter  matrix  H  of  size  (2 K  +  1)  x  (2 L  +  1) 
int  []  []  H  =  { 

{0,0, 1,1, 1,0,0}, 

{0,1, 1,1, 1,1,0}, 

{1,1, 1,1, 1,1,1}, 

{0,1, 1,1, 1,1,0}, 

{0,0, 1,1, 1,0,0}}; 

double  s  =  1.0  /  23;  //  sum  of  filter  coefficients  is  23 

//  H[L][K]  is  the  center  element  of  H: 
int  K  =  H  [0]  .  length  /  2 ;  //  K  =  3 
int  L  =  H .  length  /  2;  //  L  =  2 

ImageProcessor  copy  =  ip . duplicate () ; 

for  (int  u=K;  u  <=  M  -  K  -  1;  u++)  { 
for  (int  v=L;v<=N-L-l;  v++)  { 

//  compute  filter  result  for  position  ( u ,  v): 

int  sum  =  0; 

for  (int  i  =  -K;  i  <=  K;  i++)  { 
for  (int  j  =  -L;  j  <=  L;  j++)  { 

int  p  =  copy . getPixel (u  +  i,  v  +  j); 
int  c  =  H  [ j  +  L]  [i  +  K]  ; 
sum  =  sum  +  c  *  p; 

} 

} 

int  q  =  (int)  Math. round (s  *  sum); 

//  clamp  result: 
if  (q  <  0)  q  =  0; 
if  (q  >  255)  q  =  255; 
ip .putPixel (u,  v,  q) ; 

} 

} 

} 


Prog.  5.3 

Linear  filter  of  arbitrary  size 
using  integer  coefficients 
(Filter_Arbitrary) .  The  fil¬ 
ter  matrix  is  an  integer  array 
of  size  (2RT+1)  X  (2L  +  1)  with 
the  origin  at  the  center  ele¬ 
ment.  The  summation  variable 
sum  is  also  defined  as  an  inte¬ 
ger  (int),  which  is  scaled  by  a 
constant  factor  s  and  rounded 
in  line  32.  The  border  pixels 
are  not  modified. 


linear  filters  of  arbitrary  size.  This  example  uses  integer- valued  filter 
coefficients  (line  6)  in  combination  with  a  common  scale  factor  s,  as 
described  already.  As  usual,  the  “hot  spot”  of  the  filter  is  assumed 
to  be  at  the  matrix  center,  and  the  range  of  all  iterations  depends 
on  the  dimensions  of  the  filter  matrix.  In  this  case,  clamping  of  the 
results  is  included  (in  lines  34-35)  as  a  preventive  measure. 

5.2.7  Typ  es  of  Linear  Filters 

Since  the  effects  of  a  linear  filter  are  solely  specified  by  the  filter 
matrix  (which  can  take  on  arbitrary  values),  an  infinite  number  of 
different  linear  filters  exists,  at  least  in  principle.  So  how  can  these 
filters  be  used  and  which  filters  are  suited  for  a  given  task?  In  the 
following,  we  briefly  discuss  two  broad  classes  of  linear  Liters  that  are 
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K 
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Fig.  5.7 

Border  geometry.  The  filter 
can  be  applied  only  at  lo¬ 
cations  where  the  kernel  H 
of  size  (2 K  +  1)  x  (2 L  +  1) 
is  fully  contained  in  the 
image  (inner  rectangle). 


Full  coverage 


M 


of  key  importance  in  practice:  smoothing  filters  and  difference  filters 
(Fig.  5.8). 

Smoothing  filters 

Every  filter  we  have  discussed  so  far  causes  some  kind  of  smoothing. 
In  fact,  any  linear  filter  with  positive-only  coefficients  is  a  smoothing 
filter  in  a  sense,  because  such  a  filter  computes  merely  a  weighted 
average  of  the  image  pixels  within  a  certain  image  region. 

Box  filter 

This  simplest  of  all  smoothing  filters,  whose  3D  shape  resembles  a 
box  (Fig.  5.8(a)),  is  a  well-known  friend  already.  Unfortunately,  the 
box  filter  is  far  from  an  optimal  smoothing  filter  due  to  its  wild  behav¬ 
ior  in  frequency  space,  which  is  caused  by  the  sharp  cutoff  around 
its  sides.  Described  in  frequency  terms,  smoothing  corresponds  to 
low-pass  filtering,  that  is,  effectively  attenuating  all  signal  compo¬ 
nents  above  a  given  cutoff  frequency  (see  also  Chs.  18-19).  The  box 
filter,  however,  produces  strong  “ringing”  in  frequency  space  and  is 
therefore  not  considered  a  high-quality  smoothing  filter.  It  may  also 
appear  rather  ad  hoc  to  assign  the  same  weight  to  all  image  pixels  in 
the  filter  region.  Instead,  one  would  probably  expect  to  have  stronger 
emphasis  given  to  pixels  near  the  center  of  the  filter  than  to  the  more 
distant  ones.  Furthermore,  smoothing  filters  should  possibly  operate 
“isotropically”  (i.e.,  uniformly  in  each  direction),  which  is  certainly 
not  the  case  for  the  rectangular  box  filter. 

Gaussian  filter 

The  filter  matrix  (Fig.  5.8(b))  of  this  smoothing  filter  corresponds  to 
a  2D  Gaussian  function, 

x2+y2 

HG,a(x,y)=e  2(7‘2  ,  (5.14) 

where  a  denotes  the  width  (standard  deviation)  of  the  bell-shaped 
function  and  r  is  the  distance  (radius)  from  the  center.  The  pixel  at 
the  center  receives  the  maximum  weight  (1.0,  which  is  scaled  to  the 
integer  value  9  in  the  matrix  shown  in  Fig.  5.8(b)),  and  the  remain¬ 
ing  coefficients  drop  off  smoothly  with  increasing  distance  from  the 
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center.  The  Gaussian  filter  is  isotropic  if  the  discrete  filter  matrix 
is  large  enough  for  a  sufficient  approximation  (at  least  5x5).  As 
a  low-pass  filter,  the  Gaussian  is  “well-behaved”  in  frequency  space 
and  thus  clearly  superior  to  the  box  filter.  The  2D  Gaussian  filter 
is  separable  into  a  pair  of  ID  filters  (see  Sec.  5.3.3),  which  facilitates 
its  efficient  implementation.2 


Fig.  5.8 

Typical  examples  of  linear  fil¬ 
ters,  illustrated  as  3D  plots 
(top),  profiles  (center),  and 
approximations  by  discrete 
filter  matrices  (bottom).  The 
“box”  filter  (a)  and  the  Gauss 
filter  (b)  are  both  smoothing 
filters  with  all-positive  coef¬ 
ficients.  The  “Laplacian”  or 
“Mexican  hat”  filter  (c)  is  a 
difference  filter.  It  computes 
the  weighted  difference  be¬ 
tween  the  center  pixel  and  the 
surrounding  pixels  and  thus 
reacts  most  strongly  to  local 
intensity  peaks. 


Difference  filters 

If  some  of  the  filter  coefficients  are  negative,  the  filter  calculation  can 
be  interpreted  as  the  difference  of  two  sums:  the  weighted  sum  of  all 
pixels  with  associated  positive  coefficients  minus  the  weighted  sum 
of  pixels  with  negative  coefficients  in  the  filter  region  RH ,  that  is, 

I'(u,v)  =  ^fl(u+i,  v+j)  ■  \H(i,j)  I  ~^fl(u+i,v+j)  ■  \H(i,j)\ , 
(i,j)eR+  ( i,j)eR~  (E  1  Ei 


where  R J  and  R]j  denote  the  partitions  of  the  filter  with  positive 
coefficients  >  0  and  negative  coefficients  H(i,j)  <  0,  respec¬ 

tively.  For  example,  the  5x5  Laplace  filter  in  Fig.  5.8(c)  computes  the 
difference  between  the  center  pixel  (with  weight  16)  and  the  weighted 
sum  of  12  surrounding  pixels  (with  weights  —1  or  —2).  The  remain¬ 
ing  12  pixels  have  associated  zero  coefficients  and  are  thus  ignored  in 
the  computation. 

While  local  intensity  variations  are  smoothed  by  averaging,  we  can 
expect  the  exact  contrary  to  happen  when  differences  are  taken:  local 
intensity  changes  are  enhanced.  Important  applications  of  difference 
filters  thus  include  edge  detection  (Sec.  6.2)  and  image  sharpening 
(Sec.  6.6). 


5.3  Formal  Properties  of  Linear  Filters 

In  the  previous  sections,  we  have  approached  the  concept  of  filters 
in  a  rather  casual  manner  to  quickly  get  a  grasp  of  how  filters  are 
defined  and  used.  While  such  a  level  of  treatment  may  be  sufficient 
for  most  practical  purposes,  the  power  of  linear  filters  may  not  really 

See  also  Sec.  E  in  the  Appendix. 
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be  apparent  yet  considering  the  limited  range  of  (simple)  applications 
seen  so  far. 

The  real  importance  of  linear  filters  (and  perhaps  their  formal 
elegance)  only  becomes  visible  when  taking  a  closer  look  at  some  of 
the  underlying  theoretical  details.  At  this  point,  it  may  be  surprising 
to  the  experienced  reader  that  we  have  not  mentioned  the  term  “con¬ 
volution”  in  this  context  yet.  We  make  up  for  this  in  the  remaining 
parts  of  this  section. 


5.3.1  Linear  Convolution 

The  operation  associated  with  a  linear  filter,  as  described  in  the  pre¬ 
vious  section,  is  not  an  invention  of  digital  image  processing  but  has 
been  known  in  mathematics  for  a  long  time.  It  is  called  linear  con¬ 
volution 3  and  in  general  combines  two  functions  of  the  same  dimen¬ 
sionality,  either  continuous  or  discrete.  For  discrete,  2D  functions  I 
and  17,  the  convolution  operation  is  defined  as 

oo  oo 

I\u,v)=  Yi  Y  Uu~i’v~V  •  H(hj)  >  (5.16) 

i—  —  oo  j—  —  co 

or,  expressed  with  the  designated  convolution  operator  (*)  in  the 
form 

I'  =  I*H.  (5.17) 

This  almost  looks  the  same  as  Eqn.  (5.5),  with  two  differences:  the 
range  of  the  variables  i,  j  in  the  summation  and  the  negative  signs  in 
the  coordinates  of  I(u  —  i,  v  —  j ).  The  first  point  is  easy  to  explain: 
because  the  coefficients  outside  the  filter  matrix  also  referred 

to  as  a  filter  kernel ,  are  assumed  to  be  zero,  the  positions  outside  the 
matrix  are  irrelevant  in  the  summation.  To  resolve  the  coordinate 
issue,  we  modify  Eqn.  (5.16)  by  replacing  the  summation  variables 
ij  to 


I\u,v)  =  YUu-^v~j)  •  (5.18) 

(i,j)ERH 

=  YI(u+i’v+V  ■  Hi-i’-j)  (5-19) 

(i,j)ERH 

=  I(u-\-i,  v-\-j)  •  H* (i,  j).  (5.20) 

(i,j)£RH 


The  result  is  identical  to  the  linear  filter  in  Eqn.  (5.5),  with  the 
=  H(—i,—j)  being  the  horizontally  and  vertically  reflected 
(i.e. ,  rotated  by  180°)  kernel  H.  To  be  precise,  the  operation  in 
Eqn.  (5.5)  actually  defines  the  linear  correlation ,  which  is  merely  a 
convolution  with  a  reflected  filter  matrix.4 

3  Oddly  enough  the  simple  concept  of  convolution  is  often  (though  un¬ 
justly)  feared  as  an  intractable  mystery. 

4  Of  course  this  is  the  same  in  the  ID  case.  Linear  correlation  is  typically 
used  for  comparing  images  or  subpatterns  (see  Sec.  23.1  for  details). 
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Fig.  5.9 

Convolution  as  a  “black  box” 
operation.  The  original  im¬ 
age  I  is  subjected  to  a  linear 
convolution  (*)  with  the  convo¬ 
lution  kernel  H ,  producing  the 
output  image  I' . 


Thus  the  mathematical  concept  underlying  all  linear  filters  is  the 
convolution  operation  (*)  and  its  results  are  completely  and  suffi¬ 
ciently  specified  by  the  convolution  matrix  (or  kernel)  H.  To  illus¬ 
trate  this  relationship,  the  convolution  is  often  pictured  as  a  “black 
box”  operation,  as  shown  in  Fig.  5.9. 

5.3.2  Formal  Properties  of  Linear  Convolution 

The  importance  of  linear  convolution  is  based  on  its  simple  math¬ 
ematical  properties  as  well  as  its  multitude  of  manifestations  and 
applications.  Linear  convolution  is  a  suitable  model  for  many  types 
of  natural  phenomena,  including  mechanical,  acoustic,  and  optical 
systems.  In  particular  (as  shown  in  Ch.  18),  there  are  strong  formal 
links  to  the  Fourier  representation  of  signals  in  the  frequency  domain 
that  are  extremely  valuable  for  understanding  complex  phenomena, 
such  as  sampling  and  aliasing.  In  the  following,  however,  we  first  look 
at  some  important  properties  of  linear  convolution  in  the  accustomed 
“signal”  or  image  space. 

Commutativity 

Linear  convolution  is  commutative ;  that  is,  for  any  image  I  and  filter 
kernel  iL, 

I*H  =  H*I.  (5.21) 

Thus  the  result  is  the  same  if  the  image  and  filter  kernel  are  inter¬ 
changed,  and  it  makes  no  difference  if  we  convolve  the  image  I  with 
the  kernel  H  or  the  other  way  around.  The  two  functions  I  and  H 
are  interchangeable  and  may  assume  either  role. 

Linearity 

Linear  filters  are  so  called  because  of  the  linearity  properties  of  the 
convolution  operation,  which  manifests  itself  in  various  aspects.  For 
example,  if  an  image  is  multiplied  by  a  scalar  constant  s  G  R,  then 
the  result  of  the  convolution  multiplies  by  the  same  factor,  that  is, 

(s  •  I)  *  H  =  I*(s-H)  =  s-(I*H).  (5.22) 


Similarly,  if  we  add  two  images  Ix ,  I2  pixel  by  pixel  and  convolve  the 
resulting  image  with  some  kernel  if,  the  same  outcome  is  obtained 
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by  convolving  each  image  individually  and  adding  the  two  results 
afterward,  that  is, 

(71+72)*i7  =  (h  *H)  +  (72  *77).  (5.23) 

It  may  be  surprising,  however,  that  simply  adding  a  constant  (scalar) 
value  b  to  the  image  does  not  add  to  the  convolved  result  by  the  same 
amount, 

(6  +  /)*  H  ±  6+ (7  *77),  (5.24) 

and  is  thus  not  part  of  the  linearity  property.  While  linearity  is 
an  important  theoretical  property,  one  should  note  that  in  practice 
“linear”  filters  are  often  only  partially  linear  because  of  rounding 
errors  or  a  limited  range  of  output  values. 

Associativity 

Linear  convolution  is  associative,  meaning  that  the  order  of  successive 
filter  operations  is  irrelevant,  that  is, 

(7  *  Hi)  *  H2  =  I  *  {Hi  *  772).  (5.25) 

Thus  multiple  successive  filters  can  be  applied  in  any  order,  and 
multiple  Liters  can  be  arbitrarily  combined  into  new  filters. 

5.3.3  Separability  of  Linear  Filters 

A  direct  consequence  of  associativity  is  the  separability  of  linear  fil¬ 
ters.  If  a  convolution  kernel  H  can  be  expressed  as  the  convolution 
of  multiple  kernels  Hi  in  the  form 

H  =  Hi  *  H2  *  ...  *  Hn ,  (5.26) 

then  (as  a  consequence  of  Eqn.  (5.25))  the  filter  operation  I  *  H  may 
be  performed  as  a  sequence  of  convolutions  with  the  constituting 
kernels  Hi , 


I  *  H  =  /  *  {Hi  *  H2  *  ...  *  Hn) 

=  (...((7*^)*^)*...*^). 

Depending  upon  the  type  of  decomposition,  this  may  result  in  signif¬ 
icant  computational  savings. 


(5.27) 


x/y  separability 

The  possibility  of  separating  a  2D  kernel  H  into  a  pair  of  ID  ker¬ 
nels  hx,  hy  is  of  particular  relevance  and  is  used  in  many  practical 
applications.  Let  us  assume,  as  a  simple  example,  that  the  filter  is 
composed  of  the  ID  kernels  hx  and  hy ,  with 


h 


X 


11111 


and  hy 


1 

1 

1 


(5.28) 


respectively.  If  these  Liters  are  applied  sequentially  to  the  image  7, 


(5.29) 
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7  (7  *  hx)  *  hy , 


then  (according  to  Eqn.  (5.27))  this  is  equivalent  to  applying  the  5  3  Formal  Properties 
composite  filter  of  Linear  Filters 


H  =  hx*hy 


11111 

11111 

11111 


(5.30) 


Thus  the  2D  5  x  3  “box”  filter  H  can  be  constructed  from  two  ID 
filters  of  lengths  5  and  3,  respectively  (which  is  obviously  true  for 
box  filters  of  any  size).  But  what  is  the  advantage  of  this?  In  the 
aforementioned  case,  the  required  amount  of  processing  is  5  •  3  = 
15  steps  per  image  pixel  for  the  2D  filter  H  as  compared  with  5  ± 
3  =  8  steps  for  the  two  separate  ID  filters,  a  reduction  of  almost 
50%.  In  general,  the  number  of  operations  for  a  2D  filter  grows 
quadratically  with  the  filter  size  (side  length)  but  only  linearly  if  the 
filter  is  x/y- separable.  Clearly,  separability  is  an  eminent  bonus  for 
the  implementation  of  large  linear  filters  (see  also  Sec.  5.5.1). 


Separable  Gaussian  filters 

In  general,  a  2D  filter  is  x/^-separable  if  (as  in  the  earlier  example) 
the  filter  function  H(i,j)  can  be  expressed  as  the  outer  product  ((g)) 
of  two  ID  functions, 


=  hx(i)  ■  hy(j),  (5.31) 

because  in  this  case  the  resulting  function  also  corresponds  to  the 
convolution  product  H  =  Hx  *  Hy.  A  prominent  example  is  the 
widely  employed  2D  Gaussian  function  Ga(x,y)  (Eqn.  (5.14)),  which 
can  be  expressed  as  the  product 

Ga(x,y)  =  e"4±  (5.32) 

2  2 

=  exp(-^r)  •  exp(-  =  )  =  ga(x)  ■  ga{y).  (5.33) 

Thus  a  2D  Gaussian  filter  can  be  implemented  by  a  pair  of  ID 
Gaussian  filters  W  and  h9a  as 

I  *  =  I  *  h^a  *  hy  a.  (5.34) 

The  ordering  of  the  two  ID  Liters  is  not  relevant  in  this  case.  With 
different  a- values  along  the  x  and  y  axes,  elliptical  2D  Gaussians  can 
be  realized  as  separable  filters  in  the  same  fashion. 

The  Gaussian  function  decays  relatively  slowly  with  increasing 
distance  from  the  center.  To  avoid  visible  truncation  errors,  discrete 
approximations  of  the  Gaussian  should  have  a  sufficiently  large  extent 
of  about  ±2.5  a  to  ±3.5  a  samples.  For  example,  a  discrete  2D  Gaus¬ 
sian  with  “radius”  a  =  10  requires  a  minimum  filter  size  of  51  x  51 
pixels,  in  which  case  the  x/^-separable  version  can  be  expected  to 
run  about  50  times  faster  than  the  full  2D  filter.  The  Java  method 
makeGaussKernelldO  in  Prog.  5.4  shows  how  to  dynamically  create 
a  ID  Gaussian  filter  kernel  with  an  extent  of  ±3cr  (i.e. ,  a  vector  of 
odd  length  6cr  ±  1).  As  an  example,  this  method  is  used  for  imple¬ 
menting  “unsharp  masking”  filters  where  relatively  large  Gaussian 
kernels  may  be  required  (see  Prog.  6.1  in  Sec.  6.6.2). 
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Prog.  5.4 

Dynamic  creation  of  ID 
Gaussian  filter  kernels.  For 
a  given  a,  the  Java  method 
makeGaussKernelldO  returns  a 
discrete  ID  Gaussian  filter  ker¬ 
nel  (float  array)  large  enough 
to  avoid  truncation  effects. 


1  f loat  []  makeGaussKernelld (double  sigma)  { 

2  //  create  the  1 D  kernel  h: 

3  int  center  =  (int)  (3.0  *  sigma); 

4  float  []  h  =  new  float  [2  *  center  +  1]  ;  //  odd  size 

5  //  fill  the  1 D  kernel  h: 

9 

6  double  sigma2  =  sigma  *  sigma;  //  a 

7  for  (int  i  =  0;  i  <  h. length;  i++)  { 

8  double  r  =  center  -  i; 

9  h[i]  =  (float)  Math . exp (-0 . 5  *  (r  *  r)  /  sigma2) ; 

10  } 

11  return  h; 

12  } 


5.3.4  Impulse  Response  of  a  Filter 

Linear  convolution  is  a  binary  operation  involving  two  functions  as 
its  operands;  it  also  has  a  “neutral  element”,  which  of  course  is  a 
function,  too.  The  impulse  or  Dirac  function  5()  is  neutral  under 
convolution,  that  is, 

1*8  =  1.  (5.35) 

In  the  2D,  discrete  case,  the  impulse  function  is  defined  as 


8(u,  v) 


1  for  u  =  v  =  0, 
0  otherwise. 


(5.36) 


Interpreted  as  an  image,  this  function  is  merely  a  single  bright  pixel 
(with  value  1)  at  the  coordinate  origin  contained  in  a  dark  (zero 
value)  plane  of  infinite  extent  (Fig.  5.10). 

When  the  Dirac  function  is  used  as  the  filter  kernel  in  a  linear 
convolution  as  in  Eqn.  (5.35),  the  result  is  identical  to  the  original 
image  (Fig.  5.11).  The  reverse  situation  is  more  interesting,  however, 
where  some  filter  H  is  applied  to  the  impulse  8  as  the  input  function. 
What  happens?  Since  convolution  is  commutative  (Eqn.  (5.21))  it  is 
evident  that 

H*8  =  8*H  =  H  (5.37) 

and  thus  the  result  of  this  filter  operation  is  identical  to  the  filter 
H  itself  (Fig.  5.12)!  While  sending  an  impulse  into  a  linear  filter  to 
obtain  its  filter  function  may  seem  paradoxical  at  first,  it  makes  sense 
if  the  properties  (coefficients)  of  the  filter  H  are  unknown.  Assuming 
that  the  filter  is  actually  linear,  complete  information  about  this 
filter  is  obtained  by  injecting  only  a  single  impulse  and  measuring  the 
result,  which  is  called  the  “impulse  response”  of  the  filter.  Among 


Fig.  5.10 

Discrete  2D  impulse  or 
Dirac  function  S(u,  v). 


S(u,  v ) 


u  —  0 
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Fig.  5.11 

Convolving  the  image  I  with 
the  impulse  5  returns  the  origi¬ 
nal  (unmodified)  image. 


I'  (u,  v)  =  H ( u ,  v ) 


Fig.  5.12 

The  linear  filter  H  with  the 
impulse  5  as  the  input  yields 
the  filter  kernel  H  as  the  re¬ 
sult. 


Fig.  5.13 

Any  image  structure  is  blurred 
by  a  linear  smoothing  fil¬ 
ter.  Important  image  struc¬ 
tures  such  as  step  edges  (top) 
or  thin  lines  (bottom)  are 
widened,  and  local  contrast 
is  reduced. 


other  applications,  this  technique  is  used  for  measuring  the  behavior 
of  optical  systems  (e.g.,  lenses),  where  a  point  light  source  serves  as 
the  impulse  and  the  result — a  distribution  of  light  energy — is  called 
the  “point  spread  function”  (PSF)  of  the  system. 

5.4  Nonlinear  Filters 

5.4.1  Minimum  and  Maximum  Filters 

Like  all  other  Liters,  nonlinear  filters  calculate  the  result  at  a  given 
image  position  (r,  v)  from  the  pixels  inside  the  moving  region  Ru  v 
of  the  original  image.  The  filters  are  called  “nonlinear”  because  the 
source  pixel  values  are  combined  by  some  nonlinear  function.  The 
simplest  of  all  nonlinear  filters  are  the  minimum  and  maximum  filters, 
defined  as 


Ir{u,v)=  min  {I(u  +  h  v  +  j)}  ?  (5.38) 

(iJ)eR 

7/(r,r)=  max  {I(u  +  q  v  +  j)}  ,  (5.39) 

(ij)eR 
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Fig.  5.14 

3x3  linear  box  filter  ap¬ 
plied  to  a  grayscale  image 
corrupted  with  salt-and-pepper 
noise.  Original  (a),  filtered 
image  (b),  enlarged  details 
(c,d).  Note  that  the  indi¬ 
vidual  noise  pixels  are  only 
flattened  but  not  removed. 


Original 


Box  filter 


Fig.  5.15 

Effects  of  a  ID  minimum  fil¬ 
ter  on  different  local  signal 
structures.  Original  signal 
(top)  and  result  after  filtering 
(bottom),  where  the  colored 
bars  indicate  the  extent  of  the 
filter.  The  step  edge  (a)  and 
the  linear  ramp  (c)  are  shifted 
to  the  right  by  half  the  filter 
width,  and  the  narrow  pulse 
(b)  is  completely  removed. 


*•!  !◄ -  Width  of  filter 

(a) 


(b) 


(c) 


where  R  denotes  the  filter  region  (set  of  filter  coordinates,  usually  a 
square  of  size  3x3  pixels).  Figure  5.15  illustrates  the  effects  of  a  ID 
minimum  filter  on  various  local  signal  structures. 

Figure  5.16  shows  the  results  of  applying  3x3  pixel  minimum 
and  maximum  filters  to  a  grayscale  image  corrupted  with  “salt-and- 
pepper”  noise  (i.e.,  randomly  placed  white  and  black  dots),  respec¬ 
tively.  Obviously  the  minimum  filter  removes  the  white  (salt)  dots, 
because  any  single  white  pixel  within  the  3x3  filter  region  is  replaced 


Minimum  filter 


Maximum  filter 
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Fig.  5.16 

Minimum  and  maximum  fil¬ 
ters  applied  to  a  grayscale 
image  corrupted  with  “salt- 
and-pepper”  noise  (see  original 
in  Fig.  5.14(a)).  The  3x3 
minimum  filter  eliminates  the 
bright  dots  and  widens  all  dark 
image  structures  (a,  c).  The 
maximum  filter  shows  the  ex¬ 
act  opposite  effects  (b,d). 


by  one  of  its  surrounding  pixels  with  a  smaller  value.  Notice,  how¬ 
ever,  that  the  minimum  filter  at  the  same  time  widens  all  the  dark 
structures  in  the  image. 

The  reverse  effects  can  be  expected  from  the  maximum  filter.  Any 
single  bright  pixel  is  a  local  maximum  as  soon  as  it  is  contained  in  the 
filter  region  R.  White  dots  (and  all  other  bright  image  structures)  are 
thus  widened  to  the  size  of  the  filter,  while  now  the  dark  (“pepper”) 
dots  disappear.5 

5.4.2  Median  Filter 

It  is  impossible  of  course  to  design  a  filter  that  removes  any  noise 
but  keeps  all  the  important  image  structures  intact,  because  no  filter 
can  discriminate  which  image  content  is  important  to  the  viewer  and 
which  is  not.  The  popular  median  filter  is  at  least  a  good  step  in  this 
direction. 

5  The  image  shown  in  Figs.  5.14  and  5.16,  called  “Lena”  (or  “Lenna”),  is 
one  of  the  most  popular  test  images  in  digital  image  processing  ever  and 
thus  of  historic  interest.  The  picture  of  the  Swedish  “playmate”  Lena 
Sjooblom  (Soderberg?),  published  in  Playboy  in  1972,  was  included  in 
a  collection  of  test  images  at  the  University  of  Southern  California  and 
was  subsequently  used  by  researches  throughout  the  world  (presumably 
without  knowledge  of  its  delicate  origin)  [115]. 
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The  median  filter  replaces  every  image  pixel  by  the  median  of  the 
pixels  in  the  current  filter  region  R ,  that  is, 

I'(u,v)  =  median  {I(u  +  i,v  +  j)}.  (5.40) 


The  median  of  a  set  of  2n  + 1  values  A  =  {a0, . . . ,  a2n}  can  be  defined 
as  the  center  value  an  after  arranging  (sorting)  A  to  an  ordered 
sequence,  that  is, 


median(a0,  cq, . . . ,  un_i,  anl  an_ •  •  •  5  a2 n) 

V  _ /  V _  > 


n  values 


n  values 


(5.41) 


where  ai  <  ai+1.  Figure  5.17  demonstrates  the  calculation  of  the 
median  filter  of  size  3x3  (i.e. ,  n  =  4). 


Fig.  5.17 

Calculation  of  the  median. 
The  nine  pixel  values  col¬ 
lected  from  the  3x3  im¬ 
age  region  are  arranged  as  a 
vector  that  is  subsequently 
sorted  (A).  The  center  value 
of  A  is  taken  as  the  median. 
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Equation  (5.41)  defines  the  median  of  an  odd-sized  set  of  values, 
and  if  the  side  length  of  the  rectangular  filters  is  odd  (which  is  usually 
the  case),  then  the  number  of  elements  in  the  filter  region  is  odd  as 
well.  In  this  case,  the  median  filter  does  not  create  any  new  pixel 
values  that  did  not  exist  in  the  original  image.  If,  however,  the 
number  of  elements  is  even,  then  the  median  of  the  sorted  sequence 
A  =  (a0, . . . ,  u2n_i)  is  defined  as  the  arithmetic  mean  of  the  two 
adjacent  center  values  an_ L  and  an, 


median  (a0, . . . ,  an_ 


v- 

n  values 
—  Q'n 


■  1 i  5 


5  a2 


n- 


v- 

n  values 

CLi  X  Q,n 


1 


)  = 


an  i  T  a 


n 


(5.42) 


By  averaging  an_ L  and  an,  new  pixel  values  are  generally  introduced 
by  the  median  filter  if  the  region  is  of  even  size. 

Figure  5.18  compares  the  results  of  median  filtering  with  a  linear- 
smoothing  filter.  Finally,  Fig.  5.19  illustrates  the  effects  of  a  3  x  3 
pixel  median  filter  on  selected  2D  image  structures.  In  particular, 
very  small  structures  (smaller  than  half  the  filter  size)  are  eliminated, 
but  all  other  structures  remain  largely  unchanged.  A  sample  Java 
implementation  of  the  median  filter  of  arbitrary  size  is  shown  in  Prog. 
5.5.  The  constant  K  specifies  the  side  length  of  the  filter  region  R  of 
size  (2 r  +  1)  x  (2 r  +  1).  The  number  of  elements  in  R  (equal  to  the 
length  of  the  vector  A)  is 

(2r  +  l)2  =  4(r2  +  r)  +  1,  (5.43) 

and  thus  the  index  of  the  middle  vector  element  is  n  =  2  (r2  +  r). 
Setting  r  =  1  gives  a  3  x  3  median  filter  (n  =  4),  r  =  2  gives  a  5  x  5 


108 


Box  filter  (linear) 


Median  filter  (nonlinear) 


5.4  Nonlinear  Filters 

Fig.  5.18 

Linear  smoothing  filter  vs. 
median  filter  applied  to  a 
grayscale  image  corrupted 
with  salt-and-pepper  noise  (see 
original  in  Fig.  5.14(a)).  The 
3x3  linear  box  filter  (a,  c) 
reduces  the  bright  and  dark 
peaks  to  some  extent  but  is 
unable  to  remove  them  com¬ 
pletely.  In  addition,  the  entire 
image  is  blurred.  The  median 
filter  (b,  d)  effectively  elimi¬ 
nates  the  noise  dots  and  also 
keeps  the  remaining  structures 
largely  intact.  However,  it  also 
creates  small  spots  of  flat  in¬ 
tensity  that  noticeably  affect 
the  sharpness. 


(a) 


(b) 


(c) 


(d) 


Fig.  5.19 

Effects  of  a  3  X  3  pixel  median 
filter  on  different  2D  image 
structures.  Isolated  dots  are 
eliminated  (a),  as  are  thin  lines 
(b).  The  step  edge  remains 
unchanged  (c),  while  a  corner 
is  rounded  off  (d). 


filter  (n  =  12),  etc.  The  structure  of  this  plugin  is  similar  to  the 
arbitrary  size  linear  filter  in  Prog.  5.3. 


5.4.3  Weighted  Median  Filter 

The  median  is  a  rank  order  statistic,  and  in  a  sense  the  “majority”  of 
the  pixel  values  involved  determine  the  result.  A  single  exceptionally 
high  or  low  value  (an  “outlier”)  cannot  influence  the  result  much  but 
only  shift  the  result  up  or  down  to  the  next  value.  Thus  the  median 
(in  contrast  to  the  linear  average)  is  considered  a  “robust”  measure. 
In  an  ordinary  median  filter,  each  pixel  in  the  filter  region  has  the 
same  influence,  regardless  of  its  distance  from  the  center. 
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Prog.  5.5 

Median  filter  of  arbitrary  size 
(Plugin  Filter_Median).  An 
array  A  of  type  int  is  defined 
(line  16)  to  hold  the  region’s 
pixel  values  for  each  filter  po¬ 
sition  (u,  v).  This  array  is 
sorted  by  using  the  Java  utility 
method  Arrays  .  sort  ()  in  line 
32.  The  center  element  of  the 
sorted  vector  (A  [n] )  is  taken  as 
the  median  value  and  stored  in 
the  original  image  (line  33). 
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1  import  i j . ImagePlus ; 

2  import  ij . plugin . filter . PluglnFilter ; 

3  import  ij . process . ImageProcessor ; 

4  import  j ava. util .Arrays ; 

5 


6  public  class  Filter_Median  implements  PluglnFilter  { 
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37  } 


final  int  r  =  4;  //  specifies  the  size  of  the  filter 

public  void  run (ImageProcessor  ip)  { 
int  M  =  ip  .getWidthO  ; 
int  N  =  ip . getHeight () ; 

ImageProcessor  copy  =  ip . duplicate () ; 

//  vector  to  hold  pixels  from  (2r+1)x(2r+1)  neighborhood: 
int  []  A  =  new  int  [(2  *  r  +  1)  *  (2  *  r  +  1)]  ; 

//  index  of  center  vector  element  n  =  2(r2  +  r): 
int  n=2*  (r*r+r); 

for  (int  u=r;  u  <=  M  -  r  -  2;  u++)  { 
for  (int  v  =  r;  v  <=  N  -  r  -  2;  v++)  { 

//  fill  the  pixel  vector  A  for  filter  position  (u,v): 
int  k  =  0; 

for  (int  i  =  -r;  i  <=  r;  i++)  { 
for  (int  j  =  -r;  j  <=  r;  j++)  { 

A[k]  =  copy . getPixel (u  +  i,  v  +  j); 
k++ ; 

} 

} 

//  sort  vector  A  and  take  the  center  element  A  [n] : 
Arrays . sort (A) ; 
ip .putPixel (u,  v,  A  [n] ) ; 

} 

} 

} 


The  weighted  median  filter  assigns  individual  weights  to  the  posi¬ 
tions  in  the  filter  region,  which  can  be  interpreted  as  the  “number  of 
votes”  for  the  corresponding  pixel  values.  Similar  to  the  coefficient 
matrix  H  of  a  linear  filter,  the  distribution  of  weights  is  specified  by 
a  weight  matrix  FF,  with  W(i,j)  G  N.  To  compute  the  result  of  the 
modified  filter,  each  pixel  value  I(u  +  i,v  +  j )  involved  is  inserted 
W(i,j)  times  into  the  extended  pixel  vector 

A  =  (a0, . . . ,  aL_1)  of  length  L  =  E  (5.44) 


This  vector  is  again  sorted,  and  the  resulting  center  value  is  taken  as 
the  median,  as  in  the  standard  median  filter.  Figure  5.21  illustrates 
the  computation  of  the  weighted  median  filter  using  the  3x3  weight 
matrix 


Median  Filter 


Weighted  Median  Filter 


(c)  (d) 


5.4  Nonlinear  Filters 

Fig.  5.20 

Ordinary  vs.  weighted  median 
filter.  Compared  to  the  ordi¬ 
nary  median  filter  (a,  c),  the 
weighted  median  (b,  d)  shows 
superior  preservation  of  struc¬ 
tural  details.  Both  filters  are  of 
size  3x3;  the  weight  matrix 
in  Eqn.  (5.45)  was  used  for  the 
weighted  median  filter. 


w  = 


1  2  1 
2  3  2 
1  2  1 


(5.45) 


which  requires  an  extended  pixel  vector  of  length  L  =  15,  equal 
to  the  sum  of  the  weights  in  W.  If  properly  used,  the  weighted 
median  filter  yields  effective  noise  removal  with  good  preservation  of 
structural  details  (see  Fig.  5.20  for  an  example). 

Of  course  this  method  may  also  be  used  to  implement  ordinary 
median  filters  of  nonrectangular  shape;  for  example,  a  cross- shaped 
median  filter  can  be  defined  with  the  weight  matrix 


FF+ 


0  1  0 
1  1  1 
0  1  0 


(5.46) 


Not  every  arrangement  of  weights  is  useful,  however.  In  particular,  if 
the  weight  assigned  to  the  center  pixel  is  greater  than  the  sum  of  all 
other  weights,  then  that  pixel  would  always  have  the  “majority  vote” 
and  dictate  the  resulting  value,  thus  inhibiting  any  filter  effect. 


5.4.4  Other  Nonlinear  Filters 

Median  and  weighted  median  filters  are  two  examples  of  nonlinear 
filters  that  are  easy  to  describe  and  frequently  used.  Since  “nonlin- 
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/(u,  v) 


A 


Fig.  5.21 

Weighted  median  example. 
Each  pixel  value  is  inserted 
into  the  extended  pixel  vec¬ 
tor  multiple  times,  as  spec¬ 
ified  by  the  weight  matrix 
W.  For  example,  the  value 
0  from  the  center  pixel  is 
inserted  three  times  (since 
W(0,0)  =  3)  and  the  pixel 
value  7  twice.  The  pixel  vector 
is  sorted  and  the  center  value 
(2)  is  taken  as  the  median. 
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ear”  refers  to  anything  that  is  not  linear,  there  are  a  multitude  of 
filters  that  fall  into  this  category,  including  the  morphological  filters 
for  binary  and  grayscale  images,  which  are  discussed  in  Ch.  9.  Other 
types  of  nonlinear  filters,  such  as  the  corner  detector  described  in 
Ch.  7,  are  often  described  algorithmically  and  thus  defy  a  simple, 
compact  description. 

In  contrast  to  the  linear  case,  there  is  usually  no  “strong  theory” 
for  nonlinear  filters  that  could,  for  example,  describe  the  relationship 
between  the  sum  of  two  images  and  the  results  of  a  median  filter, 
as  does  Eqn.  (5.23)  for  linear  convolution.  Similarly,  not  much  (if 
anything)  can  be  stated  in  general  about  the  effects  of  nonlinear 
filters  in  frequency  space. 


5.5  Implement  ng  Filters 

5.5.1  Efficiency  of  Filter  Programs 

Computing  the  results  of  filters  is  computationally  expensive  in  most 
cases,  especially  with  large  images,  large  filter  kernels,  or  both.  Given 
an  image  of  size  M  x  N  and  a  filter  kernel  of  size  (2 K  +  1)  x  (2 L  +  1), 
a  direct  implementation  requires 

2K-2L-M-N  =  AKLMN  (5.47) 

operations,  namely  multiplications  and  additions  (in  the  case  of  a 
linear  filter).  Thus  if  both  the  image  and  the  filter  are  simply  assumed 
to  be  of  size  N  x  TV,  the  time  complexity  of  direct  filtering  is  0(N 4). 
As  described  in  Sec.  5.3.3,  substantial  savings  are  possible  when  large, 
2D  filters  can  be  decomposed  (separated)  into  smaller,  possibly  ID 
filters. 

The  programming  examples  in  this  chapter  are  deliberately  de¬ 
signed  to  be  simple  and  easy  to  understand,  and  none  of  the  solutions 
shown  is  particularly  efficient.  Possibilities  for  tuning  and  code  opti¬ 
mization  exist  in  many  places.  It  is  particularly  important  to  move 
all  unnecessary  instructions  out  of  inner  loops  if  possible  because 


these  are  executed  most  often.  This  applies  especially  to  “expensive” 
instructions,  such  as  method  invocations,  which  may  be  relatively 
time-consuming. 

In  the  examples,  we  have  intentionally  used  the  Image J  standard 
methods  getPixelO  for  reading  and  putPixelO  for  writing  image 
pixels,  which  is  the  simplest  and  safest  approach  to  accessing  image 
data  but  also  the  slowest,  of  course.  Substantial  speed  can  be  gained 
by  using  the  quicker  read  and  write  methods  get  ()  and  set  ()  defined 
for  class  ImageProcessor  and  its  subclasses.  Note,  however,  that 
these  methods  do  not  check  if  the  passed  image  coordinates  are  valid. 
Maximum  performance  can  be  obtained  by  accessing  the  pixel  arrays 
directly. 

5.5.2  Handling  Image  Borders 

As  mentioned  briefly  in  Sec.  5.2.2,  the  image  borders  require  special 
attention  in  most  filter  implementations.  We  have  argued  that  the¬ 
oretically  no  filter  results  can  be  computed  at  positions  where  the 
filter  matrix  is  not  fully  contained  in  the  image  array.  Thus  any  filter 
operation  would  reduce  the  size  of  the  resulting  image,  which  is  not 
acceptable  in  most  applications.  While  no  formally  correct  remedy 
exists,  there  are  several  more  or  less  practical  methods  for  handling 
the  remaining  border  regions: 

Method  1:  Set  the  unprocessed  pixels  at  the  borders  to  some  con¬ 
stant  value  (e.g.,  “black”).  This  is  certainly  the  simplest  method, 
but  not  acceptable  in  many  situations  because  the  image  size  is 
incrementally  reduced  by  every  filter  operation. 

Method  2:  Set  the  unprocessed  pixels  to  the  original  (unfiltered) 
image  values.  Usually  the  results  are  unacceptable,  too,  due  to 
the  noticeable  difference  between  filtered  and  unprocessed  image 
parts. 

Method  3:  Expand  the  image  by  “padding”  additional  pixels 
around  it  and  apply  the  filter  to  the  border  regions  as  well.  Fig. 
5.22  shows  different  options  for  padding  images. 

A.  The  pixels  outside  the  image  have  a  constant  value  (e.g., 
“black”  or“gray”,  see  Fig.  5.22(a)).  This  may  produce  strong 
artifacts  at  the  image  borders,  particularly  when  large  filters 
are  used. 

B.  The  border  pixels  extend  beyond  the  image  boundaries  (Fig. 
5.22(b)).  Only  minor  artifacts  can  be  expected  at  the  bor¬ 
ders.  The  method  is  also  simple  to  compute  and  is  thus  often 
considered  the  method  of  choice. 

C.  The  image  is  mirrored  at  each  of  its  four  boundaries  (Fig. 
5.22(c)).  The  results  will  be  similar  to  those  of  the  previous 
method  unless  very  large  filters  are  used. 

D.  The  image  repeats  periodically  in  the  horizontal  and  vertical 
directions  (Fig.  5.22(d)).  This  may  seem  strange  at  first,  and 
the  results  are  generally  not  satisfactory.  However,  in  discrete 
spectral  analysis,  the  image  is  implicitly  treated  as  a  periodic 
function,  too  (see  Ch.  18).  Thus,  if  the  image  is  filtered  in 
the  frequency  domain,  the  results  will  be  equal  to  filtering  in 
the  space  domain  under  this  repetitive  model. 
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Fig.  5.22 

Methods  for  padding  the  im¬ 
age  to  facilitate  filtering  along 
the  borders.  The  assump¬ 
tion  is  that  the  (nonexist¬ 
ing)  pixels  outside  the  orig¬ 
inal  image  are  either  set  to 
some  constant  value  (a),  take 
on  the  value  of  the  closest 
border  pixel  (b),  are  mir¬ 
rored  at  the  image  bound¬ 
aries  (c),  or  repeat  periodically 
along  the  coordinate  axes  (d). 


(a) 


» 

*  s 

(d) 


(e) 


None  of  these  methods  is  perfect  and,  as  usual,  the  right  choice  de¬ 
pends  upon  the  type  of  image  and  the  filter  applied.  Notice  also  that 
the  special  treatment  of  the  image  borders  may  sometimes  require 
more  programming  effort  (and  computing  time)  than  the  processing 
of  the  interior  image. 


5.5.3  Debugging  Filter  Programs 

Experience  shows  that  programming  errors  can  hardly  ever  be  avoided, 
even  by  experienced  practitioners.  Unless  errors  occur  during  execu¬ 
tion  (usually  caused  by  trying  to  access  nonexistent  array  elements), 
filter  programs  always  “do  something”  to  the  image  that  may  be  sim¬ 
ilar  but  not  identical  to  the  expected  result.  To  assure  that  the  code 
operates  correctly,  it  is  not  advisable  to  start  with  full,  large  images 
but  first  to  experiment  with  small  test  cases  for  which  the  outcome 
can  easily  be  predicted.  Particularly  when  implementing  linear  fil¬ 
ters,  a  first  “litmus  test”  should  always  be  to  inspect  the  impulse 
response  of  the  filter  (as  described  in  Sec.  5.3.4)  before  processing 
any  real  images. 
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5.6  Filter  Operations 
in  ImageJ 


ImageJ  offers  a  collection  of  readily  available  filter  operations,  many 
of  them  contributed  by  other  authors  using  different  styles  of  imple¬ 
mentation.  Most  of  the  available  operations  can  also  be  invoked  via 
Image J’s  Process  menu. 


5.6.1  Linear  Filters 

Filters  based  on  linear  convolution  are  implemented  by  the  Im¬ 
ageJ  plugin  class  ij  .plugin,  filter  .Convolver,  which  offers  useful 
“public”  methods  in  addition  to  the  standard  run()  method.  Usage 
of  this  class  is  illustrated  by  the  following  example  that  convolves  an 
8-bit  grayscale  image  with  the  filter  kernel  from  Eqn.  (5.7): 


H  = 


0.075 

0.125 

0.075 


0.125 

0.200 

0.125 


0.075 

0.125 

0.075 


In  the  following  run()  method,  we  first  define  the  filter  matrix  H  as  a 
ID  float  array  (notice  the  syntax  for  the  float  constants  “0 . 075f  ”, 
etc.)  and  then  create  a  new  instance  (cv)  of  class  Convolver  in  line  8: 


import  i j . plugin . f ilter . Convolver ; 

•  •  • 

public  void  run(ImageProcessor  I)  { 

f  loat  []  H  =  {  // coefficient  array  H  is  one-dimensional! 

0 . 075f ,  0 . 125f  ,  0 . 075f , 

0 . 125f ,  0 . 200f  ,  0 . 125f , 

0 . 075f ,  0 . 125f ,  0 . 075f  }; 

Convolver  cv  =  new  Convolver (); 

cv .  setNormalize  (true) ;  //  turn  on  filter  normalization 

cv .  convolve  (I ,  H,  3,  3);  //  apply  the  filter  H  to  I 

1 

The  invocation  of  the  method  convolve  ()  applies  the  filter  H  to  the 
image  I.  It  requires  two  additional  arguments  for  the  dimensions  of 
the  filter  matrix  since  H  is  passed  as  a  ID  array.  The  image  I  is 
destructively  modified  by  the  convolve  operation. 

In  this  case,  one  could  have  also  used  the  nonnormalized,  integer¬ 
valued  filter  matrix  given  in  Eqn.  (5.10)  because  convolve  ()  normal¬ 
izes  the  given  filter  automatically  (after  cv.  setNormalize  (true)). 

5.6.2  Gaussian  Filters 

The  ImageJ  class  ij  .plugin. filter . GaussianBlur  implements  a 
simple  Gaussian  blur  filter  with  arbitrary  radius  (cr).  The  filter  uses 
separable  ID  Gaussians  as  described  in  Sec.  5.3.3.  Here  is  an  example 
showing  its  application  with  a  =  2.5: 

import  ij .plugin . filter . GaussianBlur ; 

•  •  • 

public  void  run(ImageProcessor  I)  { 

GaussianBlur  gb  =  new  GaussianBlur () ; 
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double  sigmaX  =  2.5; 
double  sigmaY  =  sigmaX; 
double  accuracy  =  0.01; 

gb .blur Gauss i an(I ,  sigmaX,  sigmaY,  accuracy); 


} 

The  accuracy  value  specifies  the  size  of  the  discrete  filter  kernel. 
Higher  accuracy  reduces  truncation  errors  but  requires  larger  kernels 
and  more  processing  time. 

An  alternative  implementation  of  separable  Gaussian  filters  can 
be  found  in  Prog.  6.1  (see  p.  145),  which  uses  the  method  make- 
GaussKernelldO  defined  in  Prog.  5.4  (page  104)  for  dynamically 
calculating  the  required  ID  filter  kernels. 

5.6.3  Nonlinear  Filters 

A  small  set  of  nonlinear  filters  is  implemented  in  the  ImageJ  class 
ij  .plugin. filter . RankFi Iters,  including  the  minimum,  maxi¬ 
mum,  and  standard  median  filters.  The  filter  region  is  (approxi¬ 
mately)  circular  with  variable  radius.  Here  is  an  example  that  applies 
three  different  filters  with  the  same  radius  in  sequence: 

import  ij .plugin. filter .RankFilters ; 

•  •  • 

public  void  run(ImageProcessor  I)  { 

RankFilters  rf  =  new  RankFilters () ; 
double  radius  =  3.5; 

rf.rank(I,  radius,  RankFilters  .MIN) ;  //  minimum  filter 

rf.rank(I,  radius,  RankFilters  .MAX) ;  //  maximum  filter 

rf.rank(I,  radius,  RankFilters  .MEDIAN)  ;  //  median  filter 

} 


5.7  Exercises 


Exercise  5.1.  Explain  why  the  “custom  filter”  in  Adobe  Photoshop 
(Fig.  5.6)  is  not  strictly  a  linear  filter. 

Exercise  5.2.  Determine  the  possible  maximum  and  minimum  re¬ 
sults  (pixel  values)  for  the  following  linear  filter,  when  applied  to  an 
8-bit  grayscale  image  (with  pixel  values  in  the  range  [0,255]): 


H  = 


-1  -2  0 

-2  0  2 

0  2  1 


Assume  that  no  clamping  of  the  results  occurs. 

Exercise  5.3.  Modify  the  ImageJ  plugin  shown  in  Prog.  5.3  such 
that  the  image  borders  are  processed  as  well.  Use  one  of  the  methods 
for  extending  the  image  outside  its  boundaries  as  described  in  Sec. 
5.5.2. 
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Exercise  5.4.  Show  that  a  standard  box  filter  is  not  isotropic  (i.e., 
does  not  smooth  the  image  identically  in  all  directions). 


Exercise  5.5.  Explain  why  the  clamping  of  results  to  a  limited  range  57  Exercises 

of  pixel  values  may  violate  the  linearity  property  (Sec.  5.3.2)  of  linear 

filters. 


Exercise  5.6.  Verify  the  properties  of  the  impulse  function  with  re¬ 
spect  to  linear  filters  (see  Eqn.  (5.37)).  Create  a  black  image  with 
a  white  pixel  at  its  center  and  use  this  image  as  the  2D  impulse. 
See  if  linear  filters  really  deliver  the  filter  matrix  H  as  their  impulse 
response. 

Exercise  5.7.  Describe  the  effects  of  the  linear  filters  with  the  fol¬ 
lowing  kernels: 


"0  0  0" 

"0  0  0" 

H3  =  \- 

"0  0  1" 

Hi  = 

0  0  1 

,  H2  = 

0  2  0 

0  1  0 

0  0  0 

0  0  0 

1  0  0 

Exercise  5.8.  Design  a  linear  filter  (kernel)  that  creates  a  horizontal 
blur  over  a  length  of  7  pixels,  thus  simulating  the  effect  of  camera 
movement  during  exposure. 

Exercise  5.9.  Compare  the  number  of  processing  steps  required  for 
non- separable  linear  filters  and  x/y- separable  filters  sized  5  x  5,  11  x 
11,  25  x  25,  and  51  x  51  pixels.  Compute  the  speed  gain  resulting 
from  separability  in  each  case. 


Exercise  5.10.  Program  your  own  Image J  plugin  that  implements  a 
Gaussian  smoothing  filter  with  variable  filter  width  (radius  a).  The 
plugin  should  dynamically  create  the  required  filter  kernels  with  a 
size  of  at  least  5a  in  both  directions.  Make  use  of  the  fact  that  the 
Gaussian  function  is  x/^-separable  (see  Sec.  5.3.3). 


Exercise  5.11.  The  Laplacian  of  Gaussian  (LoG)  filter  (see  Fig.  5.8) 
is  based  on  the  sum  of  the  second  derivatives  of  the  2D  Gaussian.  It 
is  defined  as 


La(x,y ) 


or  +  y‘ 


a 


•  e 


2 ,  2 

x*  +  y* 
2.^2 


(5.48) 


Implement  the  LoG  filter  as  an  Image  J  plugin  of  variable  width  (cr), 
analogous  to  Exercise  5.10.  Find  out  if  the  LoG  function  is  x/y- 
separable. 


Exercise  5.12.  Implement  a  circular  (i.e.,  disk-shaped)  median  filter 
for  grayscale  images.  Make  the  filter’s  radius  r  adjustable  in  the  range 
from  1  to  10  pixels  (e.g.,  using  ImageJ’s  GenericDialog  class).  Use  a 
binary  (0/1)  disk-shaped  mask  to  represent  the  Liter’s  support  region 
R ,  with  a  minimum  size  of  (2r  +  l)  x  (2r  +  l),  as  shown  in  Fig.  5.23(a). 
Create  this  mask  dynamically  for  the  chosen  filter  radius  r  (see  Fig. 
5.23(c-h)  for  typical  results). 

Exercise  5.13.  Implement  a  weighted  median  filter  (see  Sec.  5.4.3) 
as  an  Image  J  plugin,  specifying  the  weights  as  a  constant,  2D  int 
array.  Test  the  Liter  on  suitable  images  and  compare  the  results  with 
those  from  a  standard  median  Liter.  Explain  why,  for  example,  the 
following  weight  matrix  does  not  make  sense: 
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5  Filters 


Fig.  5.23 

Disk-shaped  median  filter. 
Example  of  a  binary  mask 
to  represent  the  support  re¬ 
gion  R  with  radius  r  =  8  (a). 
The  origin  of  the  filter  re¬ 
gion  is  located  at  its  center. 
Synthetic  test  image  (b). 
Results  of  the  median  fil¬ 
ter  for  r  =  1,  .  .  .  ,  6  (c— h). 


w  = 


0  1  0 
1  5  1 
0  1  0 


Exercise  5.14.  The  “jitter”  filter  is  a  (quite  exotic)  example  for  a 
nonhomogeneous  filter.  For  each  image  position,  it  selects  a  space- 
variant  filter  kernel  (of  size  2r  +  1)  containing  a  single,  randomly 
placed  impulse  (1);  for  example, 


H 


u,v 


0  0  0  1  0 
0  0  0  0  0 
0  0  0  0  0 
0  0  0  0  0 
0  0  0  0  0 


(5.49) 


for  r  =  2.  The  position  of  the  1- value  in  the  kernel  Hu  v  is  uniformly 
distributed  in  the  range  i,  j  E  [— r,  r];  thus  the  filter  effectively  picks 
a  random  pixel  value  from  the  surrounding  (2 r  +  1)  x  (2r  +  1)  neigh¬ 
borhood.  Implement  this  filter  for  r  =  3,  5, 10,  as  shown  in  Fig.  5.24. 
Is  this  filter  linear  or  nonlinear?  Develop  another  version  using  a 
Gaussian  random  distribution. 
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5.7  Exercises 
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6 

Edges  and  Contours 


Prominent  image  “events”  originating  from  local  changes  in  inten¬ 
sity  or  color,  such  as  edges  and  contours,  are  of  high  importance  for 
the  visual  perception  and  interpretation  of  images.  The  perceived 
amount  of  information  in  an  image  appears  to  be  directly  related  to 
the  distinctiveness  of  the  contained  structures  and  discontinuities.  In 
fact,  edge-like  structures  and  contours  seem  to  be  so  important  for 
our  human  visual  system  that  a  few  lines  in  a  caricature  or  illus¬ 
tration  are  often  sufficient  to  unambiguously  describe  an  object  or  a 
scene.  It  is  thus  no  surprise  that  the  enhancement  and  detection  of 
edges  has  been  a  traditional  and  important  topic  in  image  processing 
as  well.  In  this  chapter,  we  first  look  at  simple  methods  for  localizing 
edges  and  then  attend  to  the  related  issue  of  image  sharpening. 


6.1  What  Makes  an  Edge? 

Edges  and  contours  play  a  dominant  role  in  human  vision  and  prob¬ 
ably  in  many  other  biological  vision  systems  as  well.  Not  only  are 
edges  visually  striking,  but  it  is  often  possible  to  describe  or  recon¬ 
struct  a  complete  figure  from  a  few  key  lines,  as  the  example  in  Fig. 
6.1  shows.  But  how  do  edges  arise,  and  how  can  they  be  technically 
localized  in  an  image? 


Fig.  6.1 

Edges  play  an  important  role 
in  human  vision.  Original  im¬ 
age  (a)  and  edge  image  (b). 


(a) 


(b) 


©  Spring er-Verlag  London  2016 
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6  Edges  and  Contours 


Fig.  6.2 

Sample  image  and  first  deriva¬ 
tive  in  one  dimension:  original 
image  (a),  horizontal  inten¬ 
sity  profile  f(x )  along  the 
center  image  line  (b),  and 
first  derivative  f'(x )  (c). 
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Edges  can  roughly  be  described  as  image  positions  where  the  lo¬ 
cal  intensity  changes  distinctly  along  a  particular  orientation.  The 
stronger  the  local  intensity  change,  the  higher  is  the  evidence  for  an 
edge  at  that  position.  In  mathematics,  the  amount  of  change  with 
respect  to  spatial  distance  is  known  as  the  first  derivative  of  a  func¬ 
tion,  and  we  thus  start  with  this  concept  to  develop  our  first  simple 
edge  detector. 


6.2  Gradient-Based  Edge  Detection 

For  simplicity,  we  first  investigate  the  situation  in  only  one  dimen¬ 
sion,  assuming  that  the  image  contains  a  single  bright  region  at  the 
center  surrounded  by  a  dark  background  (Fig.  6.2(a)).  In  this  case, 
the  intensity  profile  along  one  image  line  would  look  like  the  ID  func¬ 
tion  /(x),  as  shown  in  Fig.  6.2(b).  Taking  the  first  derivative  of  the 
function  /, 

f(x)  =  (x),  (6.1) 

results  in  a  positive  swing  at  those  positions  where  the  intensity  rises 
and  a  negative  swing  where  the  value  of  the  function  drops  (Fig. 
6.2(c)). 


Unlike  in  the  continuous  case,  however,  the  first  derivative  is  un¬ 
defined  for  a  discrete  function  f{u)  (such  as  the  line  profile  of  a  real 
image),  and  some  method  is  needed  to  estimate  it.  Figure  6.3  shows 
the  basic  idea,  again  for  the  ID  case:  the  first  derivative  of  a  con¬ 
tinuous  function  at  position  x  can  be  interpreted  as  the  slope  of  its 
tangent  at  this  position.  One  simple  method  for  roughly  approximat¬ 
ing  the  slope  of  the  tangent  for  a  discrete  function  f(u)  at  position  u 
is  to  fit  a  straight  line  through  the  neighboring  function  values  f(u—  1) 
and  f(u  + 1), 

df_  _  f(u+ 1)  -  f(u- 1)  =  f(u+ 1)  -  f(u- 1) 
dx  (w+1)  —  {u  —  1)  2 

Of  course,  the  same  method  can  be  applied  in  the  vertical  direction 
to  estimate  the  first  derivative  along  the  y- axis,  thats  is,  along  the 
image  columns. 


(6.2) 


6.2.1  Partial  Derivatives  and  the  Gradient 


A  derivative  of  a  multi-dimensional  function  taken  along  one  of  its 
coordinate  axes  is  called  a  partial  derivative ;  for  example, 


,  BI 


and 


r  dIt  \ 

IV  =  tK1') 


dy 


(6.3) 


are  the  partial  derivatives  of  the  2D  image  function  I(u,  v)  along  the 
u  and  v  axes,  respectively.1  The  vector 


V/(w,  v) 


( IX(U,V)\ 
\ly(U,V)j 


is  called  the  gradient  of  the  function  I  at  position  (u,  v ) .  The  mag- 
nitude  of  the  gradient, 


V/ 


is  invariant  under  image  rotation  and  thus  independent  of  the  orien¬ 
tation  of  the  underlying  image  structures.  This  property  is  important 
for  isotropic  localization  of  edges,  and  thus  |V/|  is  the  basis  of  many 
practical  edge  detection  methods. 


6.2.2  Derivative  Filters 


The  components  of  the  gradient  function  (Eqn.  (6.4))  are  simply  the 
first  derivatives  of  the  image  lines  (Eqn.  (6.1))  and  columns  along  the 
horizontal  and  vertical  axes,  respectively.  The  approximation  of  the 
first  horizontal  derivatives  (Eqn.  (6.2))  can  be  easily  implemented  by 
a  linear  filter  (see  Sec.  5.2)  with  the  ID  kernel 


ifi?  =  [-0.5  0  0.5]  =  0.5  •  [-1  0  1]  , 


(6.6) 


where  the  coefficients  —0.5  and  +0.5  apply  to  the  image  elements 
I(u  —  \,v)  and  I(u  +  l,v),  respectively.  Notice  that  the  center  pixel 
/(r,  v)  itself  is  weighted  with  the  zero  coefficient  and  is  thus  ignored. 
Analogously,  the  vertical  component  of  the  gradient  is  obtained  with 
the  linear  filter 


6.2  Gradient-Based 
Edge  Detection 

Fig.  6.3 

Estimating  the  first  derivative 
of  a  discrete  function. The  slope 
of  the  straight  (dashed)  line 
between  the  neighboring  func¬ 
tion  values  f(u  —  1)  and  /(u+1) 
is  taken  as  the  estimate  for  the 
slope  of  the  tangent  (i.e.,  the 
first  derivative)  at  f{u). 


1  d  denotes  the  partial  derivative  or  “del”  operator. 
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6  Edges  and  Contours 


Fig.  6.4 

Partial  derivatives  of  a  2D 
function:  synthetic  image  func¬ 
tion  I  (a);  approximate  first 
derivatives  in  the  horizon¬ 
tal  direction  dl/du  (b)  and 
the  vertical  direction  dl/dv 
(c);  magnitude  of  the  result¬ 
ing  gradient  |V/|  (d).  In  (b) 

and  (c),  the  lowest  (negative)  a 

values  are  shown  black,  the 
maximum  (positive)  values  are 
white,  and  zero  values  are  gray. 

(a)  I  (b)  4 


Figure  6.4  shows  the  results  of  applying  the  gradient  filters  defined 
in  Eqn.  (6.6)  and  Eqn.  (6.7)  to  a  synthetic  test  image. 

The  orientation  dependence  of  the  filter  responses  can  be  seen 
clearly.  The  horizontal  gradient  filter  reacts  most  strongly  to 
rapid  changes  along  the  horizontal  direction,  (i.e.,  to  vertical  edges); 
analogously  the  vertical  gradient  filter  Hy  reacts  most  strongly  to 
horizontal  edges.  The  filter  response  is  zero  in  fiat  image  regions 
(shown  as  gray  in  Fig.  6.4(b,c)). 


6.3  Simple  Edge  Operators 

The  local  gradient  of  the  image  function  is  the  basis  of  many  classical 
edge-detection  operators.  Practically,  they  only  differ  in  the  type  of 
filter  used  for  estimating  the  gradient  components  and  the  way  these 
components  are  combined.  In  many  situations,  one  is  not  only  in¬ 
terested  in  the  strength  of  edge  points  but  also  in  the  local  direction 
of  the  edge.  Both  types  of  information  are  contained  in  the  gradient 
function  and  can  be  easily  computed  from  the  directional  compo¬ 
nents.  The  following  small  collection  describes  some  frequently  used, 
simple  edge  operators  that  have  been  around  for  many  years  and  are 
thus  interesting  from  a  historic  perspective  as  well. 


6.3.1  Prewitt  and  Sobel  Operators 


6.3  Simple  Edge 
Operators 


The  edge  operators  by  Prewitt  [191]  and  Sobel  [61]  are  two  classic 
methods  that  differ  only  marginally  in  the  derivative  filters  they  use. 


Gradient  filters 

Both  operators  use  linear  filters  that  extend  over  three  adjacent 
lines  and  columns,  respectively,  to  counteract  the  noise  sensitivity 
of  the  simple  (single  line/column)  gradient  operators  (Eqns.  (6.6) 
and  (6.7)).  The  Prewitt  operator  uses  the  filter  kernels 


"-1 

0 

r 

"-1  - 

-1  - 

-1" 

-1 

0 

i 

and  Hy  = 

0 

0 

0 

-1 

0 

i 

1 

1 

1 

which  compute  the  average  gradient  components  across  three  neigh¬ 
boring  lines  or  columns,  respectively.  When  the  filters  are  written  in 
separated  form, 


Hl  = 

1 

1 

*  [-1  0  1] 

and  Hi  =[  111]* 

1 

o  h-1 
_ 1 

1 

1 

respectively,  it  becomes  obvious  that  H ff  performs  a  simple  (box) 
smoothing  over  three  lines  before  computing  the  x  gradient  (Eqn. 
(6.6)),  and  analogously  Hy  smooths  over  three  columns  before  com¬ 
puting  the  y  gradient  (Eqn.  (6. 7)). 2  Because  of  the  commutativity 
property  of  linear  convolution,  this  could  equally  be  described  the 
other  way  around,  with  smoothing  being  applied  after  the  computa¬ 
tion  of  the  gradients. 

The  filters  for  the  Sobel  operator  are  almost  identical;  however, 
the  smoothing  part  assigns  higher  weight  to  the  current  center  line 
and  column,  respectively: 


"-1 

0 

r 

"-1  - 

i 

CM 

1 

i 

T— 1 

1 

to 

0 

2 

and  Hy  = 

0 

0 

0 

-1 

0 

1 

1 

2 

1 

(6.10) 


The  estimates  for  the  local  gradient  components  are  obtained  from 
the  filter  results  by  appropriate  scaling,  that  is, 


V/(r,  v)  ~ 


1 

6 


\{I  *  Hy)(u,v)j 


for  the  Prewitt  operator  and 


(6.11) 


V/(r,  v)  ~ 


1 

8 


( {l*Hi)(u,v)\ 
\(l*H$)(u,v)J 


(6.12) 


for  the  Sobel  operator. 


In  Eqn.  (6.9),  *  is  the  linear  convolution  operator  (see  Sec.  5.3.1). 


2 
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6  Edges  and  Contours 


Fig.  6.5 

Calculation  of  edge  magnitude 
and  orientation  (geometry). 


4(u,  v) 


Edge  strength  and  orientation 

In  the  following,  we  denote  the  scaled  filter  results  (obtained  with 
either  the  Prewitt  or  Sobel  operator)  as 

Ix  =  I  *  Hx  and  Iy  =  I  *  Hy . 

In  both  cases,  the  local  edge  strength  E(u,  v )  is  defined  as  the  gradi¬ 
ent  magnitude 

E(u,  v)  =  sJTl (u,  v)  +  (U)  v)  (6.13) 

and  the  local  edge  orientation  angle  <P(r,  v)  is3 * 

v)  =  tan-1  \  7 — =  ArcTan(/a,(^,  v),  Iy(u,  v)) ,  (6.14) 

as  illustrated  in  Fig.  6.5. 

The  whole  process  of  extracting  the  edge  magnitude  and  orien¬ 
tation  is  summarized  in  Fig.  6.6.  First,  the  original  image  I  is  inde¬ 
pendently  convolved  with  the  two  gradient  filters  Hx  and  Hy,  and 
subsequently  the  edge  strength  E  and  orientation  are  computed 
from  the  filter  results.  Figure  6.7  shows  the  edge  strength  and  ori¬ 
entation  for  two  test  images,  obtained  with  the  Sobel  filters  in  Eqn. 
(6.10). 


Fig.  6.6 

Typical  process  of  gradient- 
based  edge  extraction.  The 
linear  derivative  filters 
and  produce  two  gradi¬ 
ent  images,  Ix  and  Iy ,  re¬ 
spectively.  They  are  used  to 
compute  the  edge  strength 
E  and  orientation  <P  for 
each  image  position  (u,  v). 


The  estimate  of  the  edge  orientation  based  on  the  original  Prewitt 
and  Sobel  filters  is  relatively  inaccurate,  and  improved  versions  of  the 
Sobel  filters  were  proposed  in  [126,  p.  353]  to  minimize  the  orientation 
errors: 

3  See  the  hints  in  Sec.  F.1.6  in  the  Appendix  for  computing  the  inverse 

tangent  tan ~1(y/x)  with  the  ArcTan (x,y)  function. 
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6.3  Simple  Edge 
Operators 


Fig.  6.7 

Edge  strength  and  orienta¬ 
tion  obtained  with  a  Sobel 
operator.  Original  images  (a), 
the  edge  strength  E(u,v )  (b), 
and  the  local  edge  orientation 
<P(u,v)  (c).  The  images  in  (d) 
show  the  orientation  angles 
coded  as  color  hues,  with  the 
edge  strength  controlling  the 
color  saturation  (see  Sec.  12.2.3 
for  the  corresponding  defini¬ 
tions). 


These  edge  operators  are  frequently  used  because  of  their  good  results 
(see  also  Fig.  6.11)  and  simple  implementation.  The  Sobel  operator, 
in  particular,  is  available  in  many  image-processing  tools  and  software 
packages  (including  Image J). 


6.3.2  Roberts  Operator 

As  one  of  the  simplest  and  oldest  edge  finders,  the  Roberts  operator 
[199]  today  is  mainly  of  historic  interest.  It  employs  two  extremely 
small  filters  of  size  2x2  for  estimating  the  directional  gradient  along 


6  Edges  and  Contours 


Fig.  6.8 

Diagonal  gradient  com¬ 
ponents  produced  by 
the  two  Roberts  filters. 


D1  =  I  * 


the  image  diagonals: 


0  1 

-1  0 


and  H ^ 


-1  0 

0  1 


(6.16) 


These  filters  naturally  respond  to  diagonal  edges  but  are  not  highly 
selective  to  orientation;  that  is,  both  filters  show  strong  results  over 
a  relatively  wide  range  of  angles  (Fig.  6.8).  The  local  edge  strength 
is  calculated  by  measuring  the  length  of  the  resulting  2D  vector, 
similar  to  the  gradient  computation  but  with  its  components  rotated 
45°  (Fig.  6.9). 


Fig.  6.9 

Definition  of  edge  strength 
for  the  Roberts  operator.  The 
edge  strength  F(n,  v )  corre¬ 
sponds  to  the  length  of  the 
vector  obtained  by  adding 
the  two  orthogonal  gradi¬ 
ent  components  (filter  re¬ 
sults)  D1(u,v)  and  D2(u,v). 


6.3.3  Compass  Operators 

The  design  of  linear  edge  filters  involves  a  trade-off:  the  stronger 
a  filter  responds  to  edge-like  structures,  the  more  sensitive  it  is  to 
orientation.  In  other  words,  filters  that  are  orientation  insensitive 
tend  to  respond  to  nonedge  structures,  while  the  most  discriminating 
edge  filters  only  respond  to  edges  in  a  narrow  range  of  orientations. 
One  solution  is  to  use  not  only  a  single  pair  of  relatively  “wide”  filters 
for  two  directions  (such  as  the  Prewitt  and  the  simple  Sobel  operator 
discussed  in  Sec.  6.3.1)  but  a  larger  set  of  filters  with  narrowly  spaced 
orientations. 

Extended  Sobel  operator 

Classic  examples  are  the  edge  operator  proposed  by  Kirsch  [136]  and 
the  “extended  Sobel”  or  Robinson  operator  [200],  which  employs  the 
following  eight  filters  with  orientations  spaced  at  45°: 
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Only  the  results  of  four  of  these  eight  filters  (HES,Hfs, . . .  ,HES) 
must  actually  be  computed  since  the  remaining  four  are  identical 
except  for  the  reversed  sign.  For  example,  from  the  fact  that  Hf°  = 
—Hq  and  the  convolution  being  linear  (Eqn.  (5.22)),  it  follows  that 

/  *  Hfs  =  /  *  —HqS  =  —(I  *  HqS)  ,  (6.21) 


that  is,  the  result  for  filter  Hf  is  simply  the  negative  result  for  filter 
Hq  .  The  directional  outputs  D0, 1)1. . . .  D7  for  the  eight  Sobel  filters 
can  thus  be  computed  as  follows: 

D0^I*H^S,  IT  •(—  I  *  H?s ,  D2^I*Hfs, 

D4  < - D0,  D5  <--£>!,  D6^-D2,  D7^-D3. 

(6.22) 

The  edge  strength  Es  at  position  f  a.  v)  is  defined  as  the  maximum 
of  the  eight  filter  outputs;  that  is, 
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(6.23) 
D3(u,v)\ )  , 


6.3  Simple  Edge 
Operators 


and  the  strongest-responding  filter  also  determines  the  local  edge 
orientation  as 

IT 

<PES(u,v)  =  —j  ,  with  j  =  argmax  D^u^v).  (6.24) 

4  0<i<7 


Kirsch  operator 

Another  classic  compass  operator  is  the  one  proposed  by  Kirsch  [136], 
which  also  employs  eight  oriented  filters  with  the  following  kernels: 
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6  Edges  and  Contours  Again,  because  of  the  symmetries,  only  four  of  the  eight  filters  need 

to  be  applied  and  the  results  may  be  combined  in  the  same  way  as 
already  described  for  the  extended  Sobel  operator. 

In  practice,  this  and  other  “compass  operators”  show  only  minor 
benefits  over  the  simpler  operators  described  earlier,  including  the 
small  advantage  of  not  requiring  the  computation  of  square  roots 
(which  is  considered  a  relatively  “expensive”  operation). 

6.3.4  Edge  Operators  in  ImageJ 

The  current  version  of  ImageJ  implements  the  Sobel  operator  (as 
described  in  Eqn.  (6.10))  for  practically  any  type  of  image.  It  can  be 
invoked  via  the 

Process  >  Find  Edges 

menu  and  is  also  available  through  the  method  void  findEdgesO 
for  objects  of  type  ImageProcessor. 


6.4  Other  Edge  Operators 

One  problem  with  edge  operators  based  on  first  derivatives  (as  de¬ 
scribed  in  the  previous  section)  is  that  each  resulting  edge  is  as  wide 
as  the  underlying  intensity  transition  and  thus  edges  may  be  difficult 
to  localize  precisely.  An  alternative  class  of  edge  operators  makes  use 
of  the  second  derivatives  of  the  image  function,  including  some  pop¬ 
ular  modern  edge  operators  that  also  address  the  problem  of  edges 
appearing  at  various  levels  of  scale.  These  issues  are  briefly  discussed 
in  the  following. 

6.4.1  Edge  Detection  Based  on  Second  Derivatives 

The  second  derivative  of  a  function  measures  its  local  curvature.  The 
idea  is  that  edges  can  be  found  at  zero  positions  or — even  better — at 
the  zero  crossings  of  the  second  derivatives  of  the  image  function, 
as  illustrated  in  Fig.  6.10  for  the  ID  case.  Since  second  derivatives 
generally  tend  to  amplify  image  noise,  some  sort  of  presmoothing  is 
usually  applied  with  suitable  low-pass  filters. 

A  popular  example  is  the  “Laplacian-of-Gaussian”  (LoG)  oper¬ 
ator  [161],  which  combines  gGussian  smoothing  and  computing  the 
second  derivatives  (see  the  Laplace  Filter  in  Sec.  6.6.1)  into  a  single 
linear  filter.  The  example  in  Fig.  6.11  shows  that  the  edges  produced 
by  the  LoG  operator  are  more  precisely  localized  than  the  ones  deliv¬ 
ered  by  the  Prewitt  and  Sobel  operators,  and  the  amount  of  “clutter” 
is  comparably  small.  Details  about  the  LoG  operator  and  a  compre¬ 
hensive  survey  of  common  edge  operators  can  be  found  in  [203,  Ch.  4] 
and  [165]. 

6.4.2  Edges  at  Different  Scales 

Unfortunately,  the  results  of  the  simple  edge  operators  we  have  dis¬ 
cussed  so  far  often  deviate  from  what  we  as  humans  perceive  as  im¬ 
portant  edges.  The  two  main  reasons  for  this  are: 
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•  First,  edge  operators  only  respond  to  local  intensity  differences, 
while  our  visual  system  is  able  to  extend  edges  across  areas  of 
minimal  or  vanishing  contrast. 

•  Second,  edges  exist  not  at  a  single  fixed  resolution  or  at  a  certain 
scale  but  over  a  whole  range  of  different  scales. 

Typical  small  edge  operators,  such  as  the  Sobel  operator,  can  only 
respond  to  intensity  differences  that  occur  within  their  3x3  pixel 
filter  regions.  To  recognize  edge-like  events  over  a  greater  horizon,  we 
would  either  need  larger  edge  operators  (with  correspondingly  large 
filters)  or  to  use  the  original  (small)  operators  on  reduced  (i.e.,  scaled) 
images.  This  is  the  principal  idea  of  “multiresolution”  techniques 
(also  referred  to  as  “hierarchical”  or  “pyramid”  techniques),  which 
have  traditionally  been  used  in  many  image-processing  applications 
[41, 151].  In  the  context  of  edge  detection,  this  typically  amounts  to 
detecting  edges  at  various  scale  levels  first  and  then  deciding  which 
edge  (if  any)  at  which  scale  level  is  dominant  at  each  image  position. 

6.4.3  From  Edges  to  Contours 

Whatever  method  is  used  for  edge  detection,  the  result  is  usually  a 
continuous  value  for  the  edge  strength  for  each  image  position  and 
possibly  also  the  angle  of  local  edge  orientation.  How  can  this  in¬ 
formation  be  used,  for  example,  to  find  larger  image  structures  and 
contours  of  objects  in  particular? 

Binary  edge  maps 

In  many  situations,  the  next  step  after  edge  enhancement  (by  some 
edge  operator)  is  the  selection  of  edge  points,  a  binary  decision  about 


6.4  Other  Edge 
Operators 

Fig.  6.10 

Principle  of  edge  detection 
with  the  second  derivative: 
original  function  (a),  first 
derivative  (b),  and  second 
derivative  (c).  Edge  points 
are  located  where  the  second 
derivative  crosses  through  zero 
and  the  first  derivative  has  a 
high  magnitude. 
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6  Edges  and  Contours  whether  an  image  pixel  is  an  edge  point  or  not.  The  simplest  method 

is  to  apply  a  threshold  operation  to  the  edge  strength  delivered  by 
the  edge  operator  using  either  a  fixed  or  adaptive  threshold  value, 
which  results  in  a  binary  edge  image  or  “edge  map”. 

In  practice,  edge  maps  hardly  ever  contain  perfect  contours  but 
instead  many  small,  unconnected  contour  fragments,  interrupted  at 
positions  of  insufficient  edge  strength.  After  thresholding,  the  empty 
positions  of  course  contain  no  edge  information  at  all  that  could  pos¬ 
sibly  be  used  in  a  subsequent  step,  such  as  for  linking  adjacent  edge 
segments.  Despite  this  weakness,  global  thresholding  is  often  used  at 
this  point  because  of  its  simplicity,  and  some  common  postprocess¬ 
ing  methods,  such  as  the  Hough  transform  (see  Ch.  8),  can  cope  well 
with  incomplete  edge  maps. 

Contour  following 

The  idea  of  tracing  contours  sequentially  along  the  discovered  edge 
points  is  not  uncommon  and  appears  quite  simple  in  principle.  Start¬ 
ing  from  an  image  point  with  high  edge  strength,  the  edge  is  followed 
iteratively  in  both  directions  until  the  two  traces  meet  and  a  closed 
contour  is  formed.  Unfortunately,  there  are  several  obstacles  that 
make  this  task  more  difficult  than  it  seems  at  first,  including  the 
following: 

•  edges  may  end  in  regions  of  vanishing  intensity  gradient, 

•  crossing  edges  lead  to  ambiguities,  and 

•  contours  may  branch  into  several  directions. 

Because  of  these  problems,  contour  following  usually  is  not  applied 
to  original  images  or  continuous-valued  edge  images  except  in  very 
simple  situations,  such  as  when  there  is  a  clear  separation  between 
objects  (foreground)  and  the  background.  Tracing  contours  in  seg¬ 
mented  binary  images  is  much  simpler,  of  course  (see  Ch.  10). 


6.5  Canny  Edge  Operator 

The  operator  proposed  by  Canny  [42]  is  widely  used  and  still  consid- 
ered  “state  of  the  art”  in  edge  detection.  The  method  tries  to  reach 
three  main  goals:  (a)  to  minimize  the  number  of  false  edge  points,  (b) 
achieve  good  localization  of  edges,  and  (c)  deliver  only  a  single  mark 
on  each  edge.  These  properties  are  usually  not  achieved  with  sim¬ 
ple  edge  operators  (mostly  based  on  first  derivatives  and  subsequent 
thresholding). 

At  its  core,  the  Canny  “filter”  is  a  gradient  method  (based  on 
first  derivatives;  see  Sec.  6.2),  but  it  uses  the  zero  crossings  of  second 
derivatives  for  precise  edge  localization.4  In  this  regard,  the  method 
is  similar  to  edge  detectors  that  are  based  on  the  second  derivatives 
of  the  image  function  [161]. 

Fully  implemented,  the  Canny  detector  uses  a  set  of  relatively 
large,  oriented  filters  at  multiple  image  resolutions  and  merges  the 

4  The  zero  crossings  of  a  function’s  second  derivative  are  found  where  the 
first  derivates  exhibit  a  local  maximum  or  minimum. 
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6.5  Canny  Edge 
Operator 


Original  Roberts  operator 


Prewitt  operator 


Sobel  operator 
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Fig.  6.11 

Comparison  of  various  edge 
operators.  Important  criteria 
for  the  quality  of  edge  results 
are  the  amount  of  “clutter” 
(irrelevant  edge  elements)  and 
the  connectedness  of  dominant 
edges.  The  Roberts  operator 
responds  to  very  small  edge 
structures  because  of  the  small 
size  of  its  filters.  The  similar¬ 
ity  of  the  Prewitt  and  Sobel 
operators  is  manifested  in  the 
corresponding  results.  The 
edge  map  produced  by  the 
Canny  operator  is  substan¬ 
tially  cleaner  than  those  of  the 
simpler  operators,  even  for  a 
fixed  and  relatively  small  scale 
value  cr. 


individual  results  into  a  common  edge  map.  It  is  quite  common,  how¬ 
ever,  to  use  only  a  single-scale  implementation  of  the  algorithm  with 
an  adjustable  filter  radius  (smoothing  parameter  cr),  which  is  never¬ 
theless  superior  to  most  of  the  simple  edge  operators  (see  Fig.  6.11). 
In  addition,  the  algorithm  not  only  yields  a  binary  edge  map  but 
connected  chains  of  edge  pixels,  which  greatly  simplifies  the  subse¬ 
quent  processing  steps.  Thus,  even  in  its  basic  (single-scale)  form,  the 
Canny  operator  is  often  preferred  over  other  edge  detection  methods. 

In  its  basic  (single-scale)  form,  the  Canny  operator  performs  the 
following  steps  (stated  more  precisely  in  Algs.  6. 1-6.2): 

1.  Pre-processing:  Smooth  the  image  with  a  Gaussian  filter  of 
width  cr,  which  specifies  the  scale  level  of  the  edge  detector.  Cal¬ 
culate  the  x/y  gradient  vector  at  each  position  of  the  filtered 
image  and  determine  the  local  gradient  magnitude  and  orienta¬ 
tion. 

2.  Edge  localization:  Isolate  local  maxima  of  gradient  magnitude 
by  “non-maximum  suppression”  along  the  local  gradient  direc¬ 
tion. 
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6  Edges  and  Contours 


Fig.  6.12 

Non-maximum  suppression 
of  gradient  magnitude.  The 
gradient  direction  at  posi¬ 
tion  (u,  v )  is  coarsely  quan¬ 
tized  to  four  discrete  orien¬ 
tations  Sq  E  {0,  1,2,3}  (a). 

Only  pixels  where  the  gra¬ 
dient  magnitude  Fmag(u,u) 
is  a  local  maximum  in  the 
gradient  direction  (i.e.,  per¬ 
pendicular  to  the  edge  tan¬ 
gent)  are  taken  as  candidate 
edge  points  (b).  The  gradient 
magnitude  at  all  other  points 
is  set  (suppressed)  to  zero. 


3.  Edge  tracing  and  hysteresis  thresholding:  Collect  sets  of 
connected  edge  pixels  from  the  local  maxima  by  applying  “hys¬ 
teresis  thresholding”. 

6.5.1  Pre-processing 

The  original  intensity  image  1  is  first  smoothed  with  a  Gaussian  filter 
kernel  HG,a;  its  width  a  specifies  the  spatial  scale  at  which  edges  are 
to  be  detected  (see  Alg.  6.1,  lines  2-10).  Subsequently,  first-order 
difference  filters  are  applied  to  the  smoothed  image  I  to  calculate 
the  components  I x ,  I y  of  the  local  gradient  vectors  (Alg.  6.1,  line 
3-3). 5  Then  the  local  magnitude  Enmg  is  calculated  as  the  norm 
of  the  corresponding  gradient  vector  (Alg.  6.1,  line  11).  In  view  of 
the  subsequent  thresholding  it  may  be  helpful  to  normalize  the  edge 
magnitude  values  to  a  standard  range  (e.g.,  to  [0, 100]). 

6.5.2  Edge  localization 

Candidate  edge  pixels  are  isolated  by  local  “non-maximum  suppres¬ 
sion”  of  the  edge  magnitude  Emag.  In  this  step,  only  those  pixels  are 
preserved  that  represent  a  local  maximum  along  the  ID  profile  in  the 
direction  of  the  gradient,  that  is,  perpendicular  to  the  edge  tangent 
(see  Fig.  6.12).  While  the  gradient  may  point  in  any  continuous  di¬ 
rection,  only  four  discrete  directions  are  typically  used  to  facilitate 
efficient  processing.  The  pixel  at  position  (r,  v)  is  only  retained  as 
an  edge  candidate  if  its  gradient  magnitude  is  greater  than  both  its 
immediate  neighbors  in  the  direction  specified  by  the  gradient  vector 
(dx,  dy)  at  position  (r,  v).  If  a  pixel  is  not  a  local  maximum,  its  edge 
magnitude  value  is  set  to  zero  (i.e.,  “suppressed”).  In  Alg.  6.1,  the 
non-maximum  suppressed  edge  values  are  stored  in  the  map  Enms. 
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The  problem  of  finding  the  discrete  orientation  Sq  =  0,  ...,3  for 
a  given  gradient  vector  q  =  (. dx,dy )  is  illustrated  in  Fig.  6.13.  This 


task  is  simple  if  the  corresponding  angle  6  =  tan ~1(dy/dx)  is  known, 
but  at  this  point  the  use  of  the  trigonometric  functions  is  typically 
avoided  for  efficiency  reasons.  The  octant  that  corresponds  to  q  can 
be  inferred  directly  from  the  signs  and  magnitude  of  the  components 
dxldy,  however,  the  necessary  decision  rules  are  quite  complex.  Much 
simpler  rules  apply  if  the  coordinate  system  and  gradient  vector  q  are 


See  also  Sec.  C.3.1  in  the  Appendix. 
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24: 


CannyEdgeDetector(7,  cr,  thi,  tlo) 

Input:  /,  a  grayscale  image  of  size  M  x  IV;  cr,  scale  (radius  of 
Gaussian  filter  H  ,a);  thi,  tlo,  hysteresis  thresholds  (thi  >  tlo). 
Returns  a  binary  edge  map  of  size  M  x  N. 


I  <-  I*HG'a 

>  blur  with  Gaussian  of  width  cr 

Ix<r-  I*  [-0.5  0  0.5 

>  x-gradient 

ly  1  *  [—0.5  0  0.5  ]T 

(M,  N )  <-  Size(J) 

Create  maps: 

>  y-gradient 

Emag  :MxiY  aR 

>  gradient  magnitude 

Enms  :  M  x  N  R 

>  maximum  magnitude 

Ehin  '  Mx]Vd{0,1} 

>  binary  edge  pixels 

for  all  image  coordinates  (r, 

a)  £  MxN  do 

_  _  1 1 2 

Emag  (iq  p)  X1  Cc  (^b  ^)  T  1 y  (^b  E) 

Enms( r,  r)  x  0 

Ebin(ui  P)  0 

for  u  i —  1, . . . ,  M  —  2  do 
for  v  e —  1 , . . . ,  N  —  2  do 

<4  L{U,V),  dy^ly{U,V) 

se  x—  GetOrientationSecto r(dx,dy)  >  Alg.  6.2 

if  lsLocalMax(Emag,  r,  a,  tlo)  then  >  Alg.  6.2 

Enms(u,v)  X—  Emag(u,v)  >  only  keep  local  maxima 

for  u  x —  1, . . . ,  M  —  2  do 
for  a  X—  1, ... ,  N—2  do 

if  (Enms(u,v)  >  thi)  A  (Ebin(u,v)  =  0)  then 
TraceAndThreshold(Enms,  Ebin,  r,  a,  tlo) 

>  Alg.  6.2 

return  Ebin. 


Fig.  6.13 

Discrete  gradient  directions. 

In  (a),  calculating  the  octant 
for  a  given  orientation  vec¬ 
tor  q  =  ( dx,dy )  requires  a 
relatively  complex  decision. 
Alternatively  (b),  if  q  is  ro¬ 
tated  by  ^  to  q' ,  the  corre¬ 
sponding  octant  can  be  found 
directly  from  the  components 
of  q  —  (d'x,d'y)  without  the 
need  to  calculate  the  actual 
angle.  Orientation  vectors  in 
the  other  octants  are  mirrored 
to  octants  se  =  0,  1,  2,  3. 


Alg.  6.1 

Canny  edge  detector  for 
grayscale  images. 


rotated  by  as  illustrated  in  Fig.  6.13(b).  This  step  is  implemented 
by  the  function  GetOrientationSector()  in  Alg.  6. 2. 6 

6.5.3  Edge  tracing  and  hysteresis  thresholding 

In  the  final  step,  sets  of  connected  edge  points  are  collected  from  the 
magnitude  values  that  remained  unsuppressed  in  the  previous  oper- 

6  Note  that  the  elements  of  the  rotation  matrix  in  Alg.  6.2  (line  2)  are  con¬ 
stants  and  thus  no  repeated  use  of  trigonometric  functions  is  required. 
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6  Edges  and  Contours 


Alg.  6.2 

Procedures  used  in  Alg. 
6.1  (Canny  edge  detector). 


1: 


2: 

3: 

4: 


5: 


6: 

7: 
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12: 


13: 


14: 


GetOrientationSector  (dx,dy) 

Returns  the  discrete  octant  sd  for  the  orientation  vector  (dx,  dy)T . 
See  Fig.  6.13  for  an  illustration. 


(  cos(7r/8)  —  sin(7r/8) 
1  sin(7r/8)  cos(7r/8) 


>  rotate 


X 


y 


by  7t/8 


if  d'  <  0  then 


Jy 


dx  i —  — dXj 


dy  i - 


>  mirror  to  octants  0, . . . ,  3 


/ 


Sq  A-  < 


\ 

return 


0 

1 

2 

3 

se 


if  (d'x  >  0)  A  (dx  >  d'y) 
if  (dx  >  0)  A  (dx  <  dy) 
if  ( d'x  <  0)  A  {—d'x  <  d'y) 
if  (dx  <  0)  A  \-d'x  >  d'y) 

>  sector  index  sg  €  {0, 1,  2, 3} 


IsLocalMa x(Emag,u,  v,  s0,tlo) 

Determines  if  the  gradient  magnitude  Emag  is  a  local  maximum 
at  position  (u,  v )  in  direction  se  G  {0, 1,  2,  3}. 


771/C  ^  ^mag(^) 

if  mc  <  tlo  then 
return  false 
else 


/ 


%  <-  < 


V 


mR  <-  < 


\ 


Tmag  (r  1 ,  v) 

if  sg 

=  0 

Tmag  (r  1 ,  v  1) 

if  se 

=  1 

Tmag  (r,  v  1) 

if  se 

=  2 

Tmag  {u  1 ,  v  ~h  1) 

if  se 

=  3 

Tmag(n+1,  v) 

if  sg 

=  0 

Tmag  {u~\~  1 ,  v  T 1) 

if  sg 

=  1 

Tmag  (r,  v  T 1) 

if  sg 

=  2 

Tmag  {u~\~  1 ,  v  1) 

if  sg 

=  3 

return  (raL  <  rnc)  A  (rnc  >  raR). 


15:  TraceAndThreshold(£'nms,  Ebin,  u0,  v0,  tio) 

Recursively  collects  and  marks  all  pixels  of  an  edge  that  are  8- 
connected  to  (u05Po)  and  have  a  gradient  magnitude  above  tlo. 

16:  Ehin(u0,v 0)  <—  1  >  mark  (r05Uo)  as  an  edge  pixel 

17:  uB  max(u0  — 1,0)  D>  limit  to  image  bounds 

18:  uR  A-  min(u0  +  l,  M—  1) 

19:  vT  A-  max(u0  —  1,  0) 

20:  vB  A-  min(u0  +  1,  IV  —  1) 

21:  for  u  4 —  uB , . . . ,  uB  do 

22:  for  v  A-  vT, . . . ,  vB  do 

23:  if  (Enms(u,v)  >  tlo)  A  (Ehin(u,v)  =  0)  then 

24:  TraceAndThreshold  (Tnms,  Tbin,  n,  v,  tlo) 

25:  return 


ation.  This  is  done  with  a  technique  called  “hysteresis  thresholding” 
using  two  different  threshold  values  ,  tlo  (with  thi  >  tlo).  The  image  is 
scanned  for  pixels  with  edge  magnitude  Enms(u,v)  >  thi.  Whenever 
such  a  (previously  unvisited)  location  is  found,  a  new  edge  trace  is 
started  and  all  connected  edge  pixels  (u',v')  are  added  to  it  as  long 
as  Enms(u' ,v')  >  tlo.  Only  those  edge  traces  remain  that  contain  at 
least  one  pixel  with  edge  magnitude  greater  than  thi  and  no  pixels 
with  edge  magnitude  less  than  tlo.  This  process  (which  is  similar  to 
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6.5  Canny  Edge 
Operator 


Fig.  6.14 

Grayscale  Canny  edge  opera¬ 
tor  details.  Inverted  gradient 
magnitude  (a),  detected  edge 
points  with  connected  edge 
tracks  shown  in  distinctive  col¬ 
ors  (b).  Details  with  gradient 
magnitude  and  detected  edge 
points  overlaid  (c,d).  Settings: 
a  =  2.0,  thi  =  20%,  tlo  =  5% 
(of  the  max.  edge  magnitude). 


flood-fill  region  growing)  is  detailed  in  procedure  GetOrientationSector 
in  Alg.  6.2.  Typical  threshold  values  for  8-bit  grayscale  images  are 
thi  =  ^-0  and  tlo  =  2.5. 

Figure  6.14  illustrates  the  effectiveness  of  non-maximum  suppres¬ 
sion  for  localizing  the  edge  centers  and  edge-linking  with  hysteresis 
thresholding.  Results  from  the  single-scale  Canny  detector  are  shown 
in  Fig.  6.15  for  different  settings  of  a  and  fixed  upper /lower  thresh¬ 
old  values  thi  =  20%,  tlo  =  5%  (relative  to  the  maximum  gradient 
magnitude). 


6.5.4  Additional  Information 

Due  to  the  long-lasting  popularity  of  the  Canny  operator,  additional 
descriptions  and  some  excellent  illustrations  can  be  found  at  various 
places  in  the  literature,  including  [89,  p.  719],  [232,  pp.  71-80],  and 
[166,  pp.  548-549].  An  edge  operator  similar  to  the  Canny  detector, 
but  based  on  a  set  of  recursive  filters,  is  described  in  [62].  While  the 
Canny  detector  was  originally  designed  for  grayscale  images,  modified 
versions  for  color  images  exist,  including  the  one  we  describe  in  the 
next  section. 
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Edge  points 


6  Edges  and  Contours 

Fig.  6.15 

Results  from  the  single-scale 
grayscale  Canny  edge  opera¬ 
tor  (Algs.  6.1— 6.2)  for  different 
values  of  a  =  0.5,  .  .  .  ,  5.0. 
Inverted  gradient  magnitude 
(left  column)  and  detected 
edge  points  (right  column). 
The  detected  edge  points 
(right  column)  are  linked 
to  connected  edge  chains. 


Gradient  magnitude  (Emag) 


(a) 


cr  =  0.5 


(c) 


(7=1.0 


(b) 


(d) 


6.5.5  Implementation 


A  complete  implementation  of  the  Canny  edge  detector  for  both 
grayscale  and  RGB  color  images  can  be  found  in  the  Java  library 
for  this  book.7  A  basic  usage  example  Prog.  16.1  is  shown  in  Prog. 
16.1  on  p.  411. 


Class  CannyEdgeDetector  in  package  imagingbook . pub . coloredge. 
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6.6  Edge  Sharpening 


6.6  Edge  Sharpening 


Making  images  look  sharper  is  a  frequent  task,  such  as  to  make  up 
for  a  lack  of  sharpness  after  scanning  or  scaling  an  image  or  to  pre¬ 
compensate  for  a  subsequent  loss  of  sharpness  in  the  course  of  print¬ 
ing  or  displaying  an  image.  A  common  approach  to  image  sharpening 
is  to  amplify  the  high-frequency  image  components,  which  are  mainly 
responsible  for  the  perceived  sharpness  of  an  image  and  for  which  the 
strongest  occur  at  rapid  intensity  transitions.  In  the  following,  we 
describe  two  methods  for  artificial  image  sharpening  that  are  based 
on  techniques  similar  to  edge  detection  and  thus  fit  well  in  this  chap¬ 
ter.  In  the  following,  we  describe  two  methods  for  artificial  image 
sharpening  that  are  based  on  techniques  similar  to  edge  detection 
and  thus  fit  well  in  this  chapter. 


6.6.1  Edge  Sharpening  with  the  Laplacian  Filter 

A  common  method  for  localizing  rapid  intensity  changes  are  filters 
based  on  the  second  derivatives  of  the  image  function.  Figure  6.16 
illustrates  this  idea  on  a  ID,  continuous  function  f(x).  The  second 
derivative  f"(x )  of  the  step  function  shows  a  positive  pulse  at  the 
lower  end  of  the  transition  and  a  negative  pulse  at  the  upper  end. 
The  edge  is  sharpened  by  subtracting  a  certain  fraction  w  of  the 
second  derivative  f"(x )  from  the  original  function  /(#), 

f(x)  =  f(x)  -  w  ■  f"(x) .  (6.29) 

Depending  upon  the  weight  factor  w  >  0,  the  expression  in  Eqn. 
(6.29)  causes  the  intensity  function  to  overshoot  at  both  sides  of  an 
edge,  thus  exaggerating  edges  and  increasing  the  perceived  sharpness. 


Laplacian  operator 

Sharpening  of  a  2D  function  can  be  accomplished  with  the  second 
derivatives  in  the  horizontal  and  vertical  directions  combined  by  the 
so-called  Laplacian  operator.  The  Laplacian  operator  V2  of  a  2D 
function  /(x,  y)  is  defined  as  the  sum  of  the  second  partial  derivatives 
along  the  x  and  y  directions: 

(V2/)(x,y)  =  Tt  (x,y)  +  ty~(x,y).  (6.30) 

Similar  to  the  first  derivatives  (see  Sec.  6.2.2),  the  second  derivatives 
of  a  discrete  image  function  can  also  be  estimated  with  a  set  of  sim¬ 
ple  linear  filters.  Again,  several  versions,  have  been  proposed.  For 
example,  the  two  ID  filters 


d2x 


1  -2  1 


and 


1 

2 

1 


(6.31) 


for  estimating  the  second  derivatives  along  the  x  and  y  directions, 
respectively,  combine  to  make  the  2D  Laplacian  filter 
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6  Edges  and  Contours 


A 


Fig.  6.16 

Edge  sharpening  with  the  sec¬ 
ond  derivative.  The  original 
intensity  function  f(x),  first 
derivative  f'(x) ,  second  deriva¬ 
tive  f"(x ),  and  sharpened 

A 

intensity  function  fix')  — 
f{x)  —  w  •  f"(x)  are  shown 
(w  is  a  weighting  factor). 


x 


X 


X 


X 


(6.32) 


Figure  6.17  shows  an  example  of  applying  the  Laplacian  filter  HL 
to  a  grayscale  image,  where  the  pairs  of  positive-negative  peaks  at 
both  sides  of  each  edge  are  clearly  visible.  The  filter  appears  al¬ 
most  isotropic  despite  the  coarse  approximation  with  the  small  filter 
kernels. 

Notice  that  Hh  in  Eqn.  (6.32)  is  not  separable  in  the  usual  sense 
(as  described  in  Sec.  5.3.3)  but,  because  of  the  linearity  property 
of  convolution  (Eqns.  (5.21)  and  (5.23)),  it  can  be  expressed  (and 
computed)  as  the  sum  of  two  ID  filters, 


I  *  Hl  =  I  *  PA  +  Hp  =  (/  *  Hi;)  +  (I  *  Hp  =  Ixx  +  Iyy.  (6.33) 


Analogous  to  the  gradient  filters  (for  estimating  the  first  derivatives), 
the  sum  of  the  coefficients  is  zero  in  any  Laplace  filter,  such  that  its 
response  is  zero  in  areas  of  constant  (flat)  intensity  (Fig.  6.17).  Other 
common  variants  of  3  x  3  pixel  Laplace  filters  are 


"1 

1 

1" 

"1 

2 
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1  - 

-8 

1 

oder  H12  = 
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(6.34) 


6.6  Edge  Sharpening 


(c)  Iyy  (d)  V2/ 


Fig.  6.17 

Results  of  Laplace  filter  iLL: 
synthetic  test  image  I  (a), 
second  partial  derivative  Ixx  = 
d2 1 / d2  x  in  the  horizontal 
direction  (b),  second  partial 
derivative  I  —  d2I/d2y  in 
the  vertical  direction  (c),  and 
Laplace  filter  V  I  —  Ixx  + 

Iyy  (d).  Intensities  in  (b— d) 
are  scaled  such  that  maximally 
negative  and  positive  values 
are  shown  as  black  and  white, 
respectively,  and  zero  values 
are  gray. 


Sharpening 

To  perform  the  actual  sharpening,  as  described  by  Eqn.  (6.29)  for 
the  ID  case,  we  first  apply  a  Laplacian  filter  Hh  to  the  image  I  and 
then  subtract  a  fraction  of  the  result  from  the  original  image, 

/'  I-  w  {Hl  */)•  (6-35) 

The  factor  w  specifies  the  proportion  of  the  Laplacian  component  and 
thus  the  sharpening  strength.  The  proper  choice  of  w  also  depends 
on  the  specific  Laplacian  filter  used  in  Eqn.  (6.35)  since  none  of  the 
aforementioned  Liters  is  normalized. 

Figure  6.17  shows  the  result  of  applying  a  Laplacian  filter  (with 
the  kernel  given  in  Eqn.  (6.32))  to  a  synthetic  test  image  where  the 
pairs  of  positive/negative  peaks  at  both  sides  of  each  edge  are  clearly 
visible.  The  filter  appears  almost  isotropic  despite  the  coarse  ap¬ 
proximation  with  the  small  Liter  kernels.  The  application  to  a  real 
grayscale  image  using  the  Liter  Hh  (Eqn.  (6.32))  and  w  =  1.0  is 
shown  in  Fig.  6.18. 

As  we  can  expect  from  second-order  derivatives,  the  Laplacian 
Liter  is  fairly  sensitive  to  image  noise,  which  can  be  reduced  (as  is 
commonly  done  in  edge  detection  with  Lrst  derivatives)  by  previous 
smoothing,  such  as  with  a  Gaussian  Liter  (see  also  Sec.  6.4.1). 
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Fig.  6.18 

Edge  sharpening  with  the 
Laplacian  filter.  Original 
image  with  a  horizontal  pro¬ 
file  taken  from  the  marked 
line  (a,  b),  result  of  Laplacian 
filter  JTL  (c,d),  and  sharp¬ 
ened  image  with  sharpen¬ 
ing  factor  w  =  1.0  (e,f). 


6.6.2  Unsharp  Masking 

“Unsharp  masking”  (USM)  is  a  technique  for  edge  sharpening  that  is 
particularly  popular  in  astronomy,  digital  printing,  and  many  other 
areas  of  image  processing.  The  term  originates  from  classical  pho¬ 
tography,  where  the  sharpness  of  an  image  was  optically  enhanced 
by  combining  it  with  a  smoothed  (“unsharp”)  copy.  This  process  is 
in  principle  the  same  for  digital  images. 

Process 

The  first  step  in  the  USM  filter  is  to  subtract  a  smoothed  version  of 
the  image  from  the  original,  which  enhances  the  edges.  The  result 
is  called  the  “mask”.  In  analog  photography,  the  required  smoothing 
was  achieved  by  simply  defocusing  the  lens.  Subsequently,  the  mask 
is  again  added  to  the  original,  such  that  the  edges  in  the  image  are 
sharpened.  In  summary,  the  steps  involved  in  USM  filtering  are: 

1.  The  mask  image  M  is  generated  by  subtracting  (from  the  original 
image  I)  a  smoothed  version  of  /,  obtained  by  filtering  with  if, 
that  is, 

M  <r-  I  -  (/  *H)  =  I  -  /.  (6.36) 

The  kernel  H  of  the  smoothing  filter  is  assumed  to  be  normalized 
(see  Sec.  5.2.5). 
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2.  To  obtain  the  sharpened  image  /,  the  mask  M  is  added  to  the  g  g  edge  Sharpening 
original  image  /,  weighted  by  the  factor  a,  which  controls  the 
amount  of  sharpening, 


I  <—  I  +  a  •  M,  (6.37) 

and  thus  (by  inserting  from  Eqn.  (6.36)) 

I  <—  I  +  a  •  (I  —  I)  =  (1  +  a)  •  I  —  a  •  /.  (6.38) 

Smoothing  filter 

In  principle,  any  smoothing  filter  could  be  used  for  the  kernel  H  in 
Eqn.  (6.36),  but  Gaussian  filters  77G,crwith  variable  radius  a  are  most 
common  (see  also  Sec.  5.2.7).  Typical  parameter  values  are  1  to  20 
for  cf  and  0.2  to  4.0  (equivalent  to  20%  to  400%)  for  the  sharpening 
factor  a. 

Figure  6.19  shows  two  examples  of  USM  filters  using  Gaussian 
smoothing  filters  with  different  radii  a. 


Extensions 

The  advantages  of  the  USM  filter  over  the  Laplace  filter  are  reduced 
noise  sensitivity  due  to  the  involved  smoothing  and  improved  control¬ 
lability  through  the  parameters  a  (spatial  extent)  and  a  (sharpening 
strength) . 

Of  course  the  USM  filter  responds  not  only  to  real  edges  but  to 
some  extent  to  any  intensity  transition,  and  thus  potentially  increases 
any  visible  noise  in  continuous  image  regions.  Some  implementations 
(e.g.,  Adobe  Photoshop)  therefore  provide  an  additional  threshold  pa¬ 
rameter  tc  to  specify  the  minimum  local  contrast  required  to  perform 
edge  sharpening.  Sharpening  is  only  applied  if  the  local  contrast  at 
position  (u,v),  expressed,  for  example,  by  the  gradient  magnitude 
|V/|  (Eqn.  (6.5)),  is  greater  than  that  threshold.  Otherwise,  that 
pixel  remains  unmodified,  that  is, 


7(r,  v)  <(— 


I(u,  v)  +  a  •  M(u,  v ) 
/(r,  v) 


for  | V/|(r,  v)  >  £c, 


otherwise. 


(6.39) 


Different  to  the  original  USM  filter  (Eqn.  (6.37)),  this  extended  ver¬ 
sion  is  no  longer  a  linear  filter.  On  color  images,  the  USM  filter  is 
usually  applied  to  all  color  channels  with  identical  parameter  set¬ 
tings. 


Implementation 

The  USM  filter  is  available  in  virtually  any  image-processing  software 
and,  due  to  its  simplicity  and  flexibility,  has  become  an  indispens¬ 
able  tool  for  many  professional  users.  In  Image J,  the  USM  filter  is 
implemented  by  the  plugin  class  UnsharpMask8  and  can  be  applied 
through  the  menu 

Process  >  Filter  >  Unsharp  Mask... 

8  In  package  ij .plugin. filter. 
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Fig.  6.19 

Unsharp  masking  filters  with 
varying  smoothing  radii 
cr  =  2.5  and  10.0.  The 
sharpening  strength  a  is  set 
to  1.0  (100%).  The  profiles 
show  the  intensity  function 
for  the  image  line  marked  in 
the  original  image  (top-left). 


(a)  Original 
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Distance  (pixels) 


150 


This  filter  can  also  be  used  from  other  plugin  classes,  for  example,  in 
the  following  way: 

import  i j . plugin . f ilter . UnsharpMask ; 

•  •  • 

public  void  run(ImageProcessor  ip)  { 

UnsharpMask  usm  =  new  UnsharpMask () ; 
double  r  =  2.0;  //  standard  settings  for  radius 
double  a  =  0.6;  //  standard  settings  for  weight 
usm. sharpen (ip,  r,  a); 


} 

ImageJ’s  UnsharpMask  implementation  uses  the  class  GaussianBlur 
for  the  required  smoothing  operation.  The  alternative  implementa¬ 
tion  shown  in  Prog.  6.1  follows  the  definition  in  Eqn.  (6.38)  and  uses 
Gaussian  filter  kernels  that  are  created  with  the  method  makeGauss- 
KernelldO,  as  defined  in  Prog.  5.4. 


1  double  radius  =  1.0;  //  radius  (sigma  of  Gaussian) 

2  double  amount  =  1.0;  //  amount  of  sharpening  (1  =  100%) 

3  ... 

4  public  void  run(ImageProcessor  ip)  { 

5  ImageProcessor  I  =  ip .  convertToFloat  ()  ;  III 

6 

7  //  create  a  blurred  version  of  the  image: 

8  ImageProcessor  J  =  I .  duplicate  ()  ;  II  I 

9  f loat []  H  =  GaussianFilter .makeGaussKernelld (sigma) ; 

10  Convolver  cv  =  new  Convolver (); 

11  cv . setNormalize (true) ; 

12  cv . convolve (J ,  H,  1,  H. length); 

13  cv . convolve (J ,  H,  H. length,  1); 

14 

15  I.multiply(l  +  a)  ;  II I  4—  (1  +  a)  ■  I 

16  J  .multiply  (a)  ;  II I  4— a  •  I 

17  I .  copyBits  (J ,  0 , 0 ,  Blitter .  SUBTRACT) ;  III  <—  (1  +  a)  •  I  —  a  •  I 

18 

19  //copy  result  back  into  original  byte  image 

20  ip . insert (I . convertToByte (false) ,  0,  0); 

21  } 


Laplace  vs.  USM  filter 

A  closer  look  at  these  two  methods  reveals  that  sharpening  with  the 
Laplace  filter  (Sec.  6.6.1)  can  be  viewed  as  a  special  case  of  the  USM 
filter.  If  the  Laplace  filter  in  Eqn.  (6.32)  is  decomposed  as 
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T— 1 

O 
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o 

T— I 

O 
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"0  0  0  " 

1  -4  1 

— 

1  1  1 

-5 

0  1  0 

0  1  0 

0  1  0 

0  0  0 

5  -(Hl-S),  (6.40) 


one  can  see  that  HL  consists  of  a  simple  3x3  pixel  smoothing  fil- 
ter  H  minus  the  impulse  function  S.  Laplace  sharpening  with  the 
weight  factor  w  as  defined  in  Eqn.  (6.35)  can  therefore  (by  a  little 
manipulation)  be  expressed  as 


6.6  Edge  Sharpening 


Prog.  6.1 

Unsharp  masking  (Java  im¬ 
plementation).  First  the  orig¬ 
inal  image  is  converted  to  a 
FloatProcessor  object  I  (/) 
in  line  5,  which  is  duplicated 
to  hold  the  blurred  image  J 
(/)  in  line  8.  The  method 
makeGaussKernelldO ,  defined 
in  Prog.  5.4,  is  used  to  create 
the  ID  Gaussian  filter  ker¬ 
nel  applied  in  the  horizontal 
and  vertical  directions  (lines 
12—13).  The  remaining  calcula¬ 
tions  follow  Eqn.  (6.38). 
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IL  <-  I  -  w  ■  (Hl  *  I)  =  I  —  w  ■  (5  (Hh  -6)*  I) 

=  I  -5w  ■  (Hh  *  I  -  I)  =  I  +  5w- (I  -  Hh  *  I)  (6.41) 

=  I  +  5 w  ■  Ml, 


that  is,  in  the  form  of  a  USM  filter  1  <—  I  +  a  •  M  (Eqn.  (6.37)). 
Laplacian  sharpening  is  thus  a  special  case  of  a  USM  filter  with  the 
mask  M  =  ML  =  (/  —  HL  *  /),  the  specific  smoothing  filter 


0  1  0 
1  1  1 
0  1  0 


and  the  sharpening  factor  a  =  5w. 


6.7  Exercises 


Exercise  6.1.  Calculate  (manually)  the  gradient  and  the  Laplacian 
(using  the  discrete  approximations  in  Eqn.  (6.2)  and  Eqn.  (6.32), 
respectively)  for  the  following  “image”: 

"14  10  19  16  14  12" 

18  9  11  12  10  19 
9  14  15  26  13  6 
1  ~  21  27  17  17  19  16  ' 

11  18  18  19  16  14 
16  10  13  7  22  21 


Exercise  6.2.  Implement  the  Sobel  edge  operator  as  defined  in  Eqn. 
(6.10)  (and  illustrated  in  Fig.  6.6)  as  an  ImageJ  plugin.  The  plugin 
should  generate  two  new  images  for  the  edge  magnitude  E(u,  v )  and 
the  edge  orientation  ^(r,  v).  Come  up  with  a  suitable  way  to  display 
local  edge  orientation. 


Exercise  6.3.  Express  the  Sobel  operator  (Eqn.  (6.10))  in  x/y- 
separable  form  analogous  to  the  decomposition  of  the  Prewitt  op¬ 
erator  in  Eqn.  (6.9). 

Exercise  6.4.  Implement  the  Kirsch  operator  (Eqns.  (6.25)-(6.28)) 
analogous  to  the  two-directional  Sobel  operator  in  Exercise  6.2  and 
compare  the  results  from  both  methods,  particularly  the  edge  orien¬ 
tation  estimates. 


Exercise  6.5.  Devise  and  implement  a  compass  edge  operator  with 
more  than  eight  (16?)  differently  oriented  Liters. 

Exercise  6.6.  Compare  the  results  of  the  unsharp  masking  filters 
in  ImageJ  and  Adobe  Photoshop  using  a  suitable  test  image.  How 
should  the  parameters  for  a  ( radius )  and  a  ( weight )  be  defined  in 
both  implementations  to  obtain  similar  results? 
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7 


Corner  Detection 


Corners  are  prominent  structural  elements  in  an  image  and  are  there¬ 
fore  useful  in  a  wide  variety  of  applications,  including  following  ob¬ 
jects  across  related  images  (tracking),  determining  the  correspon¬ 
dence  between  stereo  images,  serving  as  reference  points  for  precise 
geometrical  measurements,  and  calibrating  camera  systems  for  ma¬ 
chine  vision  applications.  Thus  corner  points  are  not  only  important 
in  human  vision  but  they  are  also  “robust”  in  the  sense  that  they 
do  not  arise  accidentally  in  3D  scenes  and,  furthermore,  can  be  lo¬ 
cated  quite  reliably  under  a  wide  range  of  viewing  angles  and  lighting 
conditions. 


7.1  Points  of  Interest 

Despite  being  easily  recognized  by  our  visual  system,  accurately  and 
precisely  detecting  corners  automatically  is  not  a  trivial  task.  A 
good  corner  detector  must  satisfy  a  number  of  criteria,  including 
distinguishing  between  true  and  accidental  corners,  reliably  detecting 
corners  in  the  presence  of  realistic  image  noise,  and  precisely  and 
accurately  determining  the  locations  of  corners.  Finally,  it  should 
also  be  possible  to  implement  the  detector  efficiently  enough  so  that 
it  can  be  utilized  in  real-time  applications  such  as  video  tracking. 

Numerous  methods  for  finding  corners  or  similar  interest  points 
have  been  proposed  and  most  of  them  take  advantage  of  the  following 
basic  principle.  While  an  edge  is  usually  defined  as  a  location  in  the 
image  at  which  the  gradient  is  especially  high  in  one  direction  and  low 
in  the  direction  normal  to  it,  a  corner  point  is  defined  as  a  location 
that  exhibits  a  strong  gradient  value  in  multiple  directions  at  the 
same  time. 

Most  methods  take  advantage  of  this  observation  by  examining 
the  first  or  second  derivative  of  the  image  in  the  x  and  y  directions  to 
find  corners  (e.g.,  [77,102,137,154]).  In  the  next  section,  we  describe 
in  detail  the  Harris  detector,  also  known  as  the  “Plessey  feature  point 
detector”  [102],  since  it  turns  out  that  even  though  more  efficient 

©  Spring er-Verlag  London  2016 

W.  Burger,  M.J.  Burge,  Digital  Image  Processing,  Texts  in  Computer  Science, 

DOI  10.1007/978-1-4471-6684-9  7 


147 


7  Corner  Detection  detectors  are  known  (see,  e.g.,  [210,220]),  the  Harris  detector,  and 

other  detectors  based  on  it,  are  the  most  widely  used  in  practice. 


7.2  Harris  Corner  Detector 

This  operator,  developed  by  Harris  and  Stephens  [102],  is  one  of  a 
group  of  related  methods  based  on  the  same  premise:  a  corner  point 
exists  where  the  gradient  of  the  image  is  especially  strong  in  more 
than  one  direction  at  the  same  time.  In  addition,  locations  along 
edges,  where  the  gradient  is  strong  in  only  one  direction,  should  not 
be  considered  as  corners,  and  the  detector  should  be  isotropic,  that 
is,  independent  of  the  orientation  of  the  local  gradients. 


7.2.1  Local  Structure  Matrix 


The  Harris  corner  detector  is  based  on  the  first  partial  derivatives 
(gradient)  of  the  image  function  /(u,  v),  that  is, 


ix(u,v)  =  A(u,v) 


and 


Iy{U,V)  =  -^~{U,V) 


dy 


(7.1) 


For  each  image  position  (u,v),  we  first  calculate  the  three  quantities 


A(u,v)  =  Ix(u,v ), 

B(u,v)  =  Iy(u,v), 

C{u,  v)  =  Ix(u,v )  •  Iy(u,v) 


(7.2) 

(7.3) 

(7.4) 


that  constitute  the  elements  of  the  local  structure  matrix  M(r,,c) 


.1 


M 


I1 2 3  I  I 

-‘-x  -Lx±y 

ii  i2 

±x±y  ±y 


A  C 
C  B 


(7.5) 


Next,  each  of  the  three  scalar  fields  A(u,v),  B(u,v),  C(u,v)  is  indi¬ 
vidually  smoothed  by  convolution  with  a  linear  Gaussian  filter  HG,a 
(see  Sec.  5.2.7), 


M 


a*hg  c*h, 

G 


G 

7 

G 


C*H y  B  *  Ha 


A  C 
C  B 


(7.6) 


The  eigenvalues 2  of  the  matrix  M,  defined  as' 


trace(M) 

V.2  =  - ^ ± 


“I 

-■\A  +  B±  \[X2  -  2  •  A  •  B  +  W2  +  4  •  C2)  , 


trace(M)  \  2 


det(M) 


(7.7) 


1  For  improved  legibility,  we  simplify  the  notation  used  in  the  following 
by  omitting  the  function  coordinates  (u,v)\  for  example,  the  function 
Ix(u,v)  is  abbreviated  as  Ix  or  A(u,v )  is  simply  denoted  A  etc. 

2  See  also  Sec.  B.4  in  the  Appendix. 

3  det(M)  denotes  the  determinant  and  trace(M)  denotes  the  trace  of  the 
matrix  M  (see,  e.g.,  [35,  pp.  252  and  259]). 
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are  (because  the  matrix  is  symmetric)  positive  and  real.  They  contain  ^2  Harris  Corner 
essential  information  about  the  local  image  structure.  Within  an  Detector 
image  region  that  is  uniform  (that  is,  appears  flat),  M  =  0  and 
therefore  X1  =  A2  =  0.  On  an  ideal  ramp,  however,  the  eigenvalues 
are  X1  >  0  and  A2  =  0,  independent  of  the  orientation  of  the  edge. 

The  eigenvalues  thus  encode  an  edge’s  strength ,  and  their  associated 
eigenvectors  correspond  to  the  local  edge  orientation. 

A  corner  should  have  a  strong  edge  in  the  main  direction  (cor¬ 
responding  to  the  larger  of  the  two  eigenvalues),  another  edge  nor¬ 
mal  to  the  first  (corresponding  to  the  smaller  eigenvalues),  and  both 
eigenvalues  must  be  significant.  Since  A,  B  >  0,  we  can  assume  that 
trace(M)  >  0  and  thus  \Xi\  >  |A2|.  Therefore  only  the  smaller  of 
the  two  eigenvalues,  A2  =  trace(M)/2  —  ^/  . . .  ,  is  relevant  when 
determining  a  corner. 

7.2.2  Corner  Response  Function  (CRF) 

From  Eqn.  (7.7)  we  see  that  the  difference  between  the  two  eigenval¬ 
ues  of  the  local  structure  matrix  is 

Ax  -  A2  =  2  •  \J 0.25  •  (trace(M))2  -  det(M),  (7.8) 

where  the  expression  under  the  square  root  is  always  non- negative. 

At  a  good  corner  position,  the  difference  between  the  two  eigenvalues 
Al5A2  should  be  as  small  as  possible  and  thus  the  expression  under 
the  root  in  Eqn.  (7.8)  should  be  a  minimum.  To  avoid  the  explicit  cal¬ 
culation  of  the  eigenvalues  (and  the  square  root)  the  Harris  detector 
defines  the  function 

Q(u,  v)  =  det(M(u,  u))  —  a  •  (trace(M(u,  u)))2  (7.9) 

=  A(u,  v )  •  B(u ,  v)  —  C2{u,  v)  —  a  •  [A(u,  v )  +  B{u ,  v )]2 

as  a  measure  of  “corner  strength”,  where  the  parameter  a  determines 
the  sensitivity  of  the  detector.  Q(u,v)  is  called  the  “corner  response 
function”  and  returns  maximum  values  at  isolated  corners.  In  prac¬ 
tice,  a  is  assigned  a  fixed  value  in  the  range  of  0.04  to  0.06  (max. 

0.25  =  |).  The  larger  the  value  of  a,  the  less  sensitive  the  detector 
is  and  the  fewer  corners  detected. 

7.2.3  Determining  Corner  Points 

An  image  location  (u,  v)  is  selected  as  a  potential  candidate  for  a 
corner  point  if 

Q{u,v)  >  tH, 

where  the  threshold  tH  is  selected  based  on  image  content  and  typi¬ 
cally  lies  within  the  range  of  10,000  to  1,000,000.  Once  selected,  the 
corners  cy  =  {uilvilqi)  are  inserted  into  the  sequence 

C  (c]_,  c2, . . . ,  cN), 


which  is  then  sorted  in  descending  order  (i.e. ,  qi  >  qi+ x)  according  to 
corner  strength  q{  =  Q(iq,^),  as  defined  in  Eqn.  (7.9).  To  suppress 
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7  Corner  Detection 


Table  7.1 

Harris  corner  detector — typical 
parameter  settings  for  Alg.  7.1. 


Prefilter  (Alg.  7.1,  line  2-3):  Smoothing  with  a  small  xy-separable 
filter  Hp  =  Hpx  *  Hpyi  where 

i  '  r2' 

Hp x  =  --[2  5  2]  and 


H  =  HJ  =  -  ■ 

py  ±J-px  g 


5 

2 


Gradient  filter  (Alg.  7.1,  line  3):  Computing  the  first  partial 
derivative  in  the  x  and  y  directions  with 

0.5 

hdx  —  ~0.5  0  0.5]  and 


h  —  hJ  — 

,ldy  ,ldx 


0 

0.5 


Blur  filter  (Alg.  7.1,  line  10):  Smoothing  the  individual  components 
of  the  structure  matrix  M  with  separable  Gaussian  filters 


Hb  =  Hbx  *  Hby  with 


hbx  =  4  •  [l  6  15  20  15  6  l]  and  hby  =  hTbx  =  A 


f 

6 

15 

20 

15 

6 

1 


Control  parameter  (Alg.  7.1,  line  14):  a  =  0.04, . . .  ,0.06  (default 
0.05). 

Response  threshold  (Alg.  7.1,  line  19):  tH  =  10  000, . . . ,  1000  000 
(default  20  000). 

Neighborhood  radius  (Alg.  7.1,  line  37):  dmin  =  10  Pixel. 


the  false  corners  that  tend  to  arise  in  densely  packed  groups  around 
true  corners,  all  except  the  strongest  corner  in  a  specified  vicinity 
are  eliminated.  To  accomplish  this,  the  list  C  is  traversed  from  the 
front  to  the  back,  and  the  weaker  corners  toward  the  end  of  the  list, 
which  lie  in  the  surrounding  neighborhood  of  a  stronger  corner,  are 
deleted. 

The  complete  algorithm  for  the  Harris  detector  is  summarized 
again  in  Alg.  7.1;  the  associated  parameters  are  listed  in  Table  7.1. 


7.2.4  Examples 

Figure  7.1  uses  a  simple  synthetic  image  to  illustrate  the  most  impor¬ 
tant  steps  in  corner  detection  using  the  Harris  detector.  The  figure 
shows  the  result  of  the  gradient  computation,  the  three  components 
of  the  structure  matrix  M(r,  v)  =  (  q  § ),  and  the  values  of  the  cor¬ 
ner  response  function  Q(u,v)  for  each  image  position  (u,v).  This 
example  was  calculated  with  the  standard  settings  as  given  in  Table 
7.1. 

The  second  example  (Fig.  7.2)  illustrates  the  detection  of  corner 
points  in  a  grayscale  representation  of  a  natural  scene.  It  demon¬ 
strates  how  weak  corners  are  eliminated  in  favor  of  the  strongest 
corner  in  a  region. 
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1:  HarrisCorners(/,  a,  tH,  dmin) 

Input:  /,  the  source  image;  a ,  sensitivity  parameter  (typ.  0.05); 

response  threshold  (typ.  20  000);  dmin,  minimum  distance 
between  final  corners.  Returns  a  sequence  of  the  strongest  corners 
detected  in  I. 

Step  1  -  calculate  the  corner  response  function: 

2:  Ix  <—  (I  *  hpx)  *  hdx  t>  horizontal  prefilter  and  derivative 

3:  Iy  <—  (I  *  hpy )  *  hdy  >  vertical  prefilter  and  derivative 

4:  (M,  N)  <-  Size(J) 

5:  Create  maps  A,  B,C,Q :  M  x  N  i— »•  R 

6:  for  all  image  coordinates  ( u ,  v)  do 

Compute  the  local  structure  matrix  M  =  (c  b): 

7:  A(u,v)  -f-  (4(u,  v))2 

8:  B(u,v)  <—  (Iy{u,v))2 

9:  C(u,v)  <—  Ix(u,v)  •  Iy(u,v) 

Blur  the  components  of  the  local  structure  matrix  (M): 

10:  A  A  *  Hb 

11:  B  ^  B  *  Hb 

12:  C  ^C*Hb 

13:  for  all  image  coordinates  ( u ,  v)  do  >  calc,  corner  response: 

14:  Q(u,v)  H(r,  v)-B(u,  v)  —  C2{u,  v)  —  a-[A(u,  v)  +  B(u,  v)]2 

Step  2  -  collect  the  corner  points: 

15:  C  <—  ()  >  start  with  an  empty  corner  sequence 

16:  for  all  image  coordinates  (u,  v )  do 

17:  if  Q(u,v)  >  tH  A  lsLocalMax(Q,  u,  v)  then 

18:  c  <—  (u,  v,  Q(u,  v))  >  create  a  new  corner  c 

19:  C  C  ^  (c)  D>  add  c  to  corner  sequence  C 

20:  Cciean  CleanUpCorners(C, dmin) 

21:  return  Cclean 

22:  lsLocalMax(Q,  u,  v)  D>  determine  if  Q(u,v)  is  a  local  maximum 
23:  J\f  <—  GetNeighbors(Q,  u,  v)  >  se  below 

24:  return  Q(u,v)  >  max(A/*)  >  true  or  false 

25:  GetNeighbors  (Q,u,v) 

Returns  the  8  neighboring  values  around  Q(u,v). 

26:  A f  <—  (Q(u-\-l,v),Q(u+l,v  —  l),Q(u,v  —  l),Q(u—  l,v  —  1), 

Q(u—  1,  v),  Q(u  —  1,  n  +  1),  Q(u,  v  +  1),  Q(u+  1,  n  +  1)) 

27:  return  J\f 


28 

29 

30 

31 

32 

33 

34 

35 

36 

37 

38 


CleanUpCorners(C,  dmin) 

Sort(C)  >  sort  C  by  desc.  qi  (strongest  corners  first) 

^clean  ^  (  ) 


while  C  is  not  empty  do 
c0  GetFirst(C) 

C  Delete(c0,C) 

^clean  ^  ^clean  w  (^o) 

for  all  Cj  in  C  do 

if  Dist(c0,cJ)  <  dmin  then 
C  <—  Delete^-, C) 

return  Cclean 


>  empty  “clean”  corner  sequence 

>  get  the  strongest  corner  from  C 
>  the  1st  element  is  removed  from  C 

[>  add  Cq  to  Cciean 


>  remove  element  Ca  from  C 


7.2  Harris  Corner 
Detector 

Alg.  7.1 

Harris  corner  detector.  This  al¬ 
gorithm  takes  an  intensity  im¬ 
age  I  and  creates  a  sorted  list 
of  detected  corner  points.  *  is 
the  convolution  operator  used 
for  linear  filter  operations.  De¬ 
tails  for  the  parameters  H  , 
Hdx,  H dyi  Hb,  Q!,  and  tH  can 
be  found  in  Table  7.1. 
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7  Corner  Detection 

Fig.  7.1 

Harris  corner  detector — 
Example  1.  Starting  with  the 
original  image  I(u,v),  the  first 
derivative  is  computed,  and 
then  from  it  the  components  of 
the  structure  matrix  M(u,  v), 
with  A(u,v )  =  Ix(u,v),  B  — 
Iy(u ,  v),  C  =  Ix(u ,  v)  •  Iy(u,  v). 
A(u,v)  and  B(u,v)  represent, 
respectively,  the  strength  of 
the  horizontal  and  vertical 
edges.  In  C(u,v),  the  values 
are  strongly  positive  (white)  or 
strongly  negative  (black)  only 
where  the  edges  are  strong  in 
both  directions  (null  values 
are  shown  in  gray).  The  cor¬ 
ner  response  function,  Q(n,n), 
exhibits  noticeable  positive 
peaks  at  the  corner  positions. 


/(u,  v) 


Corner  points 


7.3  Implementation 


Since  the  Harris  detector  algorithm  is  more  complex  than  the  al¬ 
gorithms  we  presented  earlier,  in  the  following  sections  we  explain 
its  implementation  in  greater  detail.  While  reading  the  following 
you  may  wish  to  refer  to  the  complete  source  code  for  the  class 
HarrisCornerDetector,  which  is  available  online  as  part  of  the 
imagingbook  library.4 


Package  imagingbook . pub . corners. 
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7.3  Implementation 
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Fig.  7.2 

Harris  corner  detector — 
Example  2.  A  complete  result 
with  the  final  corner  points 
marked  (a).  After  selecting  the 
strongest  corner  points  within 
a  10-pixel  radius,  only  335  of 
the  original  615  candidate  cor¬ 
ners  remain.  Details  before 
(b,  c)  and  after  selection  (d,e). 


+ 


(a) 


(d) 


7.3.1  Step  1:  Calculating  the  Corner  Response  Function 

To  handle  the  range  of  the  positive  and  negative  values  generated  by 
the  filters  used  in  this  step,  we  will  need  to  use  floating-point  images 
to  store  the  intermediate  results,  which  also  assures  sufficient  range 
and  precision  for  small  values.  The  kernels  of  the  required  filters, 
that  is,  the  presmoothing  filter  if  ,  the  gradient  filters  Hdx ,  Hdy , 
and  the  smoothing  filter  for  the  structure  matrix  Hb,  are  defined  as 
ID  float  arrays: 


1  float  []  hp  =  {2f/9,  5f/9,  2f/9>; 
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7  Corner  Detection  2  f loat  []  -  {0.5f,  0,  0.5f>; 

3  float  []  hb  = 

4  {lf/64,  6f/64,  15f/64,  20f/64,  15f/64,  6f/64,  lf/64}; 

From  the  original  8-bit  image  (of  type  ByteProcessor),  we  first  cre¬ 
ate  two  copies,  lx  and  Iy,  of  type  FloatProcessor: 

5  FloatProcessor  lx  =  I . convertToFloatProcessor () ; 

6  FloatProcessor  Iy  =  I . convertToFloatProcessor  () ; 

The  first  processing  step  is  to  presmooth  the  image  with  the  ID 
filter  kernel  hp  (=  hpx  =  hpyi  see  Alg.  7.1,  line  2).  Subsequently 
the  ID  gradient  filter  hd  (=  hdx  =  hdy)  is  used  to  calculate  the 
horizontal  and  vertical  derivatives  (see  Alg.  7.1,  line  3).  To  perform 
the  convolution  with  the  corresponding  ID  kernels  we  use  the  (static) 
methods  convolveXQ  and  convolveYQ  defined  in  class  Filter:5 


7 

Filter . convolveX(Ix,  hp) ; 

h  Ix  i  Ix  *  hpx 

8 

Filter . convolveX(Ix,  hd) ; 

h  Ix  ^  Ix  *  h dx 

9 

Filter . convolveY(Iy,  hp) ; 

//  iy  4  iy  ^  hpy 

10 

Filter . convolveY(Iy,  hd) ; 

H  iy  4  iy  ^  h(iy 

Now  the  components  A(u,  r>),  B(u ,  v),  C(u,  v )  of  the  structure  matrix 

M 

are  calculated  for  all  image  positions 

(r,  v): 

n 

A  =  ImageMath . sqr (lx) ; 

II  A(u ,  v)  Ix{u ,  v) 

12 

B  =  ImageMath . sqr (Iy) ; 

II  B(u,v )  Iy(u,v) 

13 

14 

C  =  ImageMath .mult (lx,  Iy) ; 

II  C (u,  v )  Ix  (u,  v )  •  Iy  (u,  v ) 

The  components  of  the  structure  matrix  are  then  smoothed  with  a 
separable  filter  kernel  Hb  =  hbx  *  hby : 


15 

Filter . convolveXY (A, 

hb); 

//  A  <-  (A* 

I'bx')  * 

16 

Filter . convolveXY (B , 

hb); 

// B  (B* 

hbx  )  *  hby 

17 

Filter . convolveXY (C , 

hb); 

II C  <-  (C  * 

hbx )  *  hby 

The  variables  A,  B,  C  of  type  FloatProcessor  are  declared  in  the 
class  HarrisCornerDetector.  sqr()  and  mult  ()  are  static  methods 
of  class  ImageMath  for  squaring  an  image  and  multiplying  two  images, 
respectively.  The  method  convolveXY(I ,  h)  is  used  to  apply  a  x/y- 
separable  2D  convolution  with  the  ID  kernel  h  to  the  image  I. 

Finally,  the  corner  response  function  (Alg.  7.1,  line  14)  is  calcu¬ 
lated  by  the  method  makeCrf  ()  and  a  new  image  (of  type  Float - 
Processor)  is  created: 

18  private  FloatProcessor  makeCrf (float  alpha)  { 

19  FloatProcessor  Q  =  new  FloatProcessor (M,  N) ; 

20  final  f  loat  []  pA  =  (f  loat  []  )  A .  getPixels  ()  ; 

21  final  f  loat  []  pB  =  (f  loat  []  )  B .  getPixels  ()  ; 

22  final  f  loat  []  pC  =  (f  loat  []  )  C .  getPixels  ()  ; 

23  final  f  loat  []  pQ  =  (f  loat  []  )  Q  .  getPixels  ()  ; 

24  for  (int  i  =  0;  i  <  M  *  N ;  i++)  { 

25  float  a  =  pA[i],  b  =  pB  [i]  ,  c  =  pC  [i]  ; 

26  float  det  =a*b-c*c;  // det(M) 

27  float  trace  =  a  +  b;  // trace (M) 

28  pQ  [i]  =  det  -  alpha  *  (trace  *  trace); 


Package  imagingbook . lib . image. 
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30  return  Q ; 

31  } 


7.3.2  Step  2:  Selecting  “Good”  Corner  Points 

The  result  of  the  first  stage  of  Alg.  7.1  is  the  corner  response  func¬ 
tion  Q(u,  v),  which  in  our  implementation  is  stored  as  a  floating-point 
image  (FloatProcessor).  In  the  second  stage,  the  dominant  corner 
points  are  selected  from  Q.  For  this  we  need  (a)  an  object  type  to 
describe  the  corners  and  (b)  a  flexible  container,  in  which  to  store 
these  objects.  In  this  case,  the  container  should  be  a  dynamic  data 
structure  since  the  number  of  objects  to  be  stored  is  not  known  be¬ 
forehand. 

The  Corner  class 

Next  we  define  a  new  class  Corner6  to  represent  individual  corner 
points  c  =  (x,y,q)  and  a  single  constructor  (in  line  35)  with  float 
parameters  t,  y  for  the  position  and  corner  strength  q: 

32  public  class  Corner  implements  Comparable<Corner>  { 

33  final  float  x,  y,  q; 

34 

35  public  Corner  (float  x,  float  y,  float  q)  { 

36  this.x  =  x; 

37  this.y  =  y; 

38  this . q  =  q; 

39  } 

40 

41  public  int  compareTo  (Corner  c2)  { 

42  if  (this.q  >  c2.q)  return  -1; 

43  if  (this.q  <  c2.q)  return  1; 

44  else  return  0; 

45  } 

46 

47  } 

The  class  Corner  implements  Java’s  Comparable  interface,  such  that 
objects  of  type  Corner  can  be  compared  with  each  other  and  thereby 
sorted  into  an  ordered  sequence.  The  compareTo ()  method  required 
by  the  Comparable  interface  is  defined  (in  line  41)  to  sort  corners  by 
descending  q  values. 

Choosing  a  suitable  container 

In  Alg.  7.1,  we  used  the  notion  of  a  sequence  or  lists  to  organize 
and  manipulate  the  collections  of  potential  corner  points  generated 
at  various  stages.  One  solution  would  be  to  utilize  arrays ,  but  since 
the  size  of  arrays  must  be  declared  before  they  are  used,  we  would 
have  to  allocate  memory  for  extremely  large  arrays  in  order  to  store 
all  the  possible  corner  points  that  might  be  identified.  Instead,  we 
make  use  of  the  ArrayList  class,  which  is  one  of  many  dynamic  data 
structures  conveniently  provided  by  Java’s  Collections  Framework.7 

Package  imagingbook . pub . corners. 
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7  Corner  Detection  The  collectCorners  ()  method 

The  method  collectCorners  ()  outlined  here  selects  the  dominant 
corner  points  from  the  corner  response  function  Q(u,  v).  The  param¬ 
eter  border  specifies  the  width  of  the  image’s  border,  within  which 
corner  points  should  be  ignored. 

48  List<Corner>  collectCorners (FloatProcessor  Q,  float  tH,  int 

border)  { 

49  List<Corner>  C  =  new  ArrayList<Corner>() ; 

50  for  (int  v  =  border;  v  <  N  -  border;  v++)  { 

51  for  (int  u  =  border;  u  <  M  -  border;  u++)  { 

52  float  q  =  Q.getf(u,  v) ; 

53  if  (q  >  tH  &&  isLocalMax (Q ,  u,  v) )  { 

54  Corner  c  =  new  Corner (u,  v,  q) ; 

55  C . add(c) ; 

56  } 

57  } 

58  } 

59  return  C ; 

60  } 

First  (in  line  49),  a  new  instance  of  ArrayList8  is  created  and  as¬ 
signed  to  the  variable  C.  Then  the  CRF  image  Q  is  traversed,  and 
when  a  potential  corner  point  is  located,  a  new  Corner  is  instan¬ 
tiated  (line  54)  and  added  to  C  (line  55).  The  Boolean  method 
isLocalMaxO  (defined  in  class  HarrisCornerDetector)  determines 
if  the  2D  function  Q  is  a  local  maximum  at  the  given  position  u,  v: 

61  boolean  isLocalMax  (FloatProcessor  Q,  int  u,  int  v)  { 

62  if  (u  <=  0  ||  u  >=  M  -  1  | |  v  <=  0  | |  v  >=  N  -  1)  { 

63  return  false; 

64  } 

65  else  { 

66  f  loat  []  q  =  (f  loat  []  )  Q  .  getPixels  ()  ; 

67  int  iO  =  (v  -  1)  *  M  +  u; 

68  int  il  =  v  *  M  +  u; 

69  int  i2  =  (v  +  1)  *  M  +  u; 

70  float  qO  =  q[il]  ; 

71  return  //  check  8  neighbors  of  qO: 

72  qO  >=  q[i0  -  1]  &&  qO  >=  q[i0]  &&  qO  >=  q[i0  +  1]  && 

73  qO  >=  q[il  -  1]  &&  qO  >=  q[il  +  1] 

&& 

74  qO  >=  q[i2  -  1]  &&  qO  >=  q[i2]  &&  qO  >=  q[i2  +  1]  ; 

75  } 

76  } 


7.3.3  Step  3:  Cleaning  up 

The  final  step  is  to  remove  the  weakest  corners  in  a  limited  area 
where  the  size  of  this  area  is  specified  by  the  radius  dmin  (Alg.  7.1, 
lines  29-38).  This  process  is  outlined  in  Fig.  7.3  and  implemented  by 
the  following  method  cleanupCorners  () . 

8  The  specification  ArrayList<Corner>  indicates  that  the  list  C  may  only 
contain  objects  of  type  Corner. 
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77  List<Corner>  cleanupCorners (List<Corner>  C,  double  dmin)  { 

78  double  dmin2  =  dmin  *  dmin; 

79  //  sort  corners  by  descending  q-value: 

80  Collections . sort (C) ; 

81  //  we  use  an  array  of  corners  for  efficiency  reasons: 

82  Corner []  Ca  =  C . toArray (new  Corner [C . size ()]) ; 

83  List<Corner>  Cclean  =  new  ArrayList<Corner> (C . size () ) ; 

84  for  (int  i  =  0;  i  <  Ca. length;  i++)  { 

85  Corner  cO  =  Ca[i]  ;  //  get  next  strongest  corner 

86  if  (cO  ! =  null)  { 

87  Cclean . add(cO) ; 

88  //  delete  all  remaining  corners  cj  too  close  to  cO: 

89  for  (int  j  =  i  +  1;  j  <  Ca. length;  j++)  { 

90  Corner  cj  =  Ca[j]  ; 

91  if  (cj  !=  null  &&  c0.dist2(cj)  <  dmin2) 

92  Ca[j]  =  null;  //delete  corner  cj  from  Ca 

93  } 

94  } 

95  } 

96  return  Cclean; 

97  } 


Fig.  7.3 

Selecting  the  strongest  corners 
within  a  given  spatial  distance, 
(a)  Sample  corner  positions  in 
the  2D  plane,  (b)  The  origi¬ 
nal  list  of  corners  (C)  is  sorted 
by  “corner  strength”  (q)  in 
descending  order;  that  is,  c0 
is  the  strongest  corner.  First, 
corner  c0  is  added  to  a  new 
list  Cclean,  while  the  weaker 
corners  c4  and  c8  (which  are 
both  within  distance  dmin 
from  c0)  are  removed  from  the 
original  list  C.  The  following 
corners  cx ,  c2,  .  .  .  are  treated 
similarly  until  no  more  ele¬ 
ments  remain  in  C.  None  of 
the  corners  in  the  resulting 
list  Cclean  is  closer  to  another 
corner  than  dmin. 


Initially  (in  line  80)  the  corner  list  C  is  sorted  by  decreasing  corner 
strenth  q  by  calling  the  static  method  sort().9  The  sorted  sequence 
is  then  converted  to  an  array  (line  82)  which  is  traversed  from  start 
to  end  (line  84-95).  For  each  selected  corner  (cO),  all  subsequent 
corners  (cj)  with  a  distance  dmin  are  deleted  from  the  sequence  (line 
92).  The  “surviving”  corners  are  then  transferred  to  the  final  corner 
sequence  Cclean. 

Note  that  the  call  c0.dist2(cj)  in  line  91  returns  the  squared 
Euclidean  distance  between  the  corner  points  c0  and  cv  that  is,  the 
quantity  d 2  =  (x0—Xj)2  +  (y0  —  i/j)2.  Since  the  square  of  the  distance 
suffices  for  the  comparison,  we  do  not  need  to  compute  the  actual 
distance,  and  consequently  we  avoid  calling  the  expensive  square  root 
function.  This  is  a  common  trick  when  comparing  distances. 


7.3.4  Summary 

Most  of  the  implementation  steps  we  have  just  described  are  initi¬ 
ated  through  calls  from  the  method  f  indCorners  ()  in  class  Harris- 
CornerDetector: 

98  public  List<Corner>  f indCorners ()  { 

Defined  in  class  java. util . Collections. 
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7  Corner  Detection 


99  FloatProcessor  Q  =  makeCrf ( (float )params . alpha) ; 

100  List<Corner>  corners  = 

101  collectCorners (Q ,  (float )params . tH,  par ams . border) ; 

102  if  (params . doCleanUp)  { 

103  corners  =  cleanupCorners (corners ,  params . dmin) ; 

104  } 

105  return  corners ; 

106  } 

An  example  of  how  to  use  the  class  HarrisCornerDetector  is  shown 
by  the  associated  Image J  plugin  Find_Corners  whose  run()  consists 
of  only  a  few  lines  of  code.  This  method  simply  creates  a  new  object  of 
the  class  HarrisCornerDetector,  calls  the  f  indCorners  ()  method, 
and  finally  displays  the  results  in  a  new  image  (R): 

107  public  class  Find_Corners  implements  PluglnFilter  { 

108 

109  public  void  run(ImageProcessor  ip)  { 

110  HarrisCornerDetector  cd  =  new  HarrisCornerDetector (ip) ; 

111  List<Corner>  corners  =  cd. f indCorners () ; 

112  ColorProcessor  R  =  ip . convertToColorProcessor  () ; 

113  drawCorners (R,  corners); 

114  (new  ImagePlus ( "Result " ,  R)).show(); 

115  } 

116 

117  void  drawCorners (ImageProcessor  ip, 

118  List<Corner>  corners)  { 

119  ip . setColor (cornerColor ) ; 

120  for  (Corner  c  :  corners)  { 

121  drawCorner (ip ,  c) ; 

122  } 

123  } 

124 

125  void  drawCorner (ImageProcessor  ip,  Corner  c)  { 

126  int  size  =  cornerSize ; 

127  int  x  =  Math. round (c .getX() ) ; 

128  int  y  =  Math . round (c . getY() ) ; 

129  ip . drawLine (x  -  size,  y,  x  +  size,  y) ; 

130  ip . drawLine (x,  y  -  size,  x,  y  +  size); 

131  } 

132  } 

For  completeness,  the  definition  of  the  drawCorners  ()  method  has 
been  included  here;  the  complete  source  code  can  be  found  online. 
Again,  when  writing  this  code,  the  focus  is  on  understandability  and 
not  necessarily  speed  and  memory  usage.  Many  elements  of  the  code 
can  be  optimized  with  relatively  little  effort  (perhaps  as  an  exercise?) 
if  efficiency  becomes  important. 


7.4  Exercises 

Exercise  7.1.  Adapt  the  draw()  method  in  the  class  Corner  (see 
p.  155)  so  that  the  strength  (g-value)  of  the  corner  points  can  also 

7T7  be  visualized.  This  could  be  done,  for  example,  by  manipulating 

15o 


the  size,  color,  or  intensity  of  the  markers  drawn  in  relation  to  the  7  4  Exercises 
strength  of  the  corner. 

Exercise  7.2.  Conduct  a  series  of  experiments  to  determine  how  im¬ 
age  contrast  affects  the  performance  of  the  Harris  detector,  and  then 
develop  an  idea  for  how  you  might  automatically  determine  the  pa¬ 
rameter  tH  depending  on  image  content. 

Exercise  7.3.  Explore  how  rotation  and  distortion  of  the  image  af¬ 
fect  the  performance  of  the  Harris  corner  detector.  Based  on  your 
experiments,  is  the  operator  truly  isotropic? 

Exercise  7.4.  Determine  how  image  noise  affects  the  performance 
of  the  Harris  detector  in  terms  of  the  positional  accuracy  of  the  de¬ 
tected  corners  and  the  omission  of  actual  corners.  Remark:  ImageJ’s 
menu  command  Process  >  Noise  >  Add  Specified  Noise...  can  be  used 
to  easily  add  certain  types  of  random  noise  to  a  given  image. 


159 


8 


Finding  Simple  Curves:  The  Hough 
Transform 


In  Chapter  6  we  demonstrated  how  to  use  appropriately  designed 
filters  to  detect  edges  in  images.  These  filters  compute  both  the  edge 
strength  and  orientation  at  every  position  in  the  image.  In  the  fol¬ 
lowing  sections,  we  explain  how  to  decide  (e.g.,  by  using  a  threshold 
operation  on  the  edge  strength)  if  a  curve  is  actually  present  at  a 
given  image  location.  The  result  of  this  process  is  generally  repre¬ 
sented  as  a  binary  edge  map.  Edge  maps  are  considered  preliminary 
results,  since  with  an  edge  filter’s  limited  (“myopic”)  view  it  is  not 
possible  to  accurately  ascertain  if  a  point  belongs  to  a  true  edge. 
Edge  maps  created  using  simple  threshold  operations  contain  many 
edge  points  that  do  not  belong  to  true  edges  (false  positives),  and, 
on  the  other  hand,  many  edge  points  are  not  detected  and  hence  are 
missing  from  the  map  (false  negatives). 


8.1  Salient  Image  Structures 

An  intuitive  approach  to  locating  large  image  structures  is  to  first 
select  an  arbitrary  edge  point,  systematically  examine  its  neighbor¬ 
ing  pixels  and  add  them  if  they  belong  to  the  object’s  contour,  and 
repeat.  In  principle,  such  an  approach  could  be  applied  to  either  a 
continuous  edge  map  consisting  of  edge  strengths  and  orientations 
or  a  simple  binary  edge  map.  Unfortunately,  with  either  input,  such 
an  approach  is  likely  to  fail  due  to  image  noise  and  ambiguities  that 
arise  when  trying  to  follow  the  contours.  Additional  constraints  and 
information  about  the  type  of  object  sought  are  needed  in  order  to 
handle  pixel-level  problems  such  as  branching,  as  well  as  interrup¬ 
tions.  This  type  of  local  sequential  contour  tracing  makes  for  an 
interesting  optimization  problem  [128]  (see  also  Sec.  10.2). 

A  completely  different  approach  is  to  search  for  globally  appar¬ 
ent  structures  that  consist  of  certain  simple  shape  features.  As  an 
example,  Fig.  8.1  shows  that  certain  structures  are  readily  apparent 
to  the  human  visual  system,  even  when  they  overlap  in  noisy  images. 
The  biological  basis  for  why  the  human  visual  system  spontaneously 
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Fig.  8.1 

The  human  visual  system  is 
capable  of  instantly  recogniz¬ 
ing  prominent  image  structures 
even  under  difficult  conditions. 


recognizes  four  lines  or  three  ellipses  in  Fig.  8.1  instead  of  a  larger 
number  of  disjoint  segments  and  arcs  is  not  completely  known.  At 
the  cognitive  level,  theories  such  as  “Gestalt”  grouping  have  been 
proposed  to  address  this  behavior.  The  next  sections  explore  one 
technique,  the  Hough  transform,  that  provides  an  algorithmic  solu¬ 
tion  to  this  problem. 


8.2  The  Hough  Transform 


The  method  from  Paul  Hough — originally  published  as  a  US  Patent 
[111]  and  often  referred  to  as  the  “Hough  transform”  (HT) — is  a 
general  approach  to  localizing  any  shape  that  can  be  defined  para¬ 
metrically  within  a  distribution  of  points  [64,  117].  For  example, 
many  geometrical  shapes,  such  as  lines,  circles,  and  ellipses,  can  be 
readily  described  using  simple  equations  with  only  a  few  parameters. 
Since  simple  geometric  forms  often  occur  as  part  of  man-made  ob¬ 
jects,  they  are  especially  useful  features  for  analysis  of  these  types  of 
images  (Fig.  8.2). 

The  Hough  transform  is  perhaps  most  often  used  for  detecting 
straight  line  segments  in  edge  maps.  A  line  segment  in  2D  can  be 
described  with  two  real-valued  parameters  using  the  classic  slope- 
intercept  form 

y  =  k  •  x  +  d,  (8*1) 


Fig.  8.2 

Simple  geometrical  forms 
such  as  sections  of  lines,  cir¬ 
cles,  and  ellipses  are  often 
found  in  man-made  objects. 
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Fig.  8.3 

Two  points,  px  and  p2 ,  lie 
on  the  same  line  when  y1  — 
kx1  +  d  and  y2  =  kx2  +  d  for  a 
particular  pair  of  parameters  k 
and  d. 


where  k  is  the  slope  and  d  the  intercept — that  is,  the  height  at  which 
the  line  would  intercept  the  y  axis  (Fig.  8.3).  A  line  segment  that 
passes  through  two  given  edge  points  px  =  (aq,  yi)  and  p2  =  (x2, y2) 
must  satisfy  the  conditions 

yi  =  k  •  x1  +  d  and  y2  =  k  •  x2  +  d,  (8*2) 

for  fc,  d  G  R.  The  goal  is  to  find  values  of  k  and  d  such  that  as  many 
edge  points  as  possible  he  on  the  line  they  describe;  in  other  words, 
the  line  that  fits  the  most  edge  points.  But  how  can  you  determine 
the  number  of  edge  points  that  he  on  a  given  line  segment?  One 
possibility  is  to  exhaustively  “draw”  every  possible  line  segment  into 
the  image  while  counting  the  number  of  points  that  he  exactly  on 
each  of  these.  Even  though  the  discrete  nature  of  pixel  images  (with 
only  a  finite  number  of  different  lines)  makes  this  approach  possible 
in  theory,  generating  such  a  large  number  of  lines  is  infeasible  in 
practice. 

8.2.1  Parameter  Space 

The  Hough  transform  approaches  the  problem  from  another  direc¬ 
tion.  It  examines  all  the  possible  line  segments  that  run  through  a 
single  given  point  in  the  image.  Every  line  Lj  =  (kj ,  d3)  that  runs 
through  a  point  p0  =  (x0,y0)  must  satisfy  the  condition 

Lj  :  y0  =  k3x 0  +  d3  (8.3) 

for  suitable  values  k3,d3.  Equation  8.3  is  underdetermined  and  the 
possible  solutions  for  k3 ,  d3  correspond  to  an  infinite  set  of  lines  pass¬ 
ing  through  the  given  point  p0  (Fig.  8.4).  Note  that  for  a  given  k3, 
the  solution  for  d3  in  Eqn.  (8.3)  is 

dj  =  -x0  ■  kj  +  y0,  (8.4) 

which  is  another  equation  for  a  line,  where  now  A,-,  d3  are  the  variables 
and  x0, 2/0  are  the  constant  parameters  of  the  equation.  The  solution 
set  {(kj,dj)}  of  Eqn.  (8.4)  describes  the  parameters  of  all  possible 
lines  Lj  passing  through  the  image  point  p0  =  (x0,y0). 

For  an  arbitrary  image  point  px  =  (a^,^),  Eqn.  (8.4)  describes 
the  line 

Mi  :  d  =  —Xi  •  k  +  yi  (8*5) 

with  the  parameters  —x^yi  in  the  so-called  parameter  or  Hough 
space,  spanned  by  the  coordinates  fc,  d.  The  relationship  between 
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Fig.  8.4 

A  set  of  lines  passing  through 
an  image  point.  For  all  possi¬ 
ble  lines  Lj  passing  through 
the  point  p0  =  (x0,y0),  the 
equation  y0  =  kjX0  +  dj 
holds  for  appropriate  val¬ 
ues  of  the  parameters  kj  ,  dj  . 
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{x,  y )  image  space  and  (fc,  d)  parameter  space  can  be  summarized  as 
follows: 


Image  Space  {pc,  y )  Parameter  Space  {k,  d ) 


Point 

Pi  =  (xi,Vi) 

i — >  Mi :  d  =  —xi  •  k  +  yi 

Line 

Line 

Lj  :  y  =  kj  •  x  +  dj 

i — »  qj  —  ( kj,dj ) 

Point 

Each  image  point  pz  and  its  associated  line  bundle  correspond  to  ex¬ 
actly  one  line  Mi  in  parameter  space.  Therefore  we  are  interested 
in  those  places  in  the  parameter  space  where  lines  intersect.  The 
example  in  Fig.  8.5  illustrates  how  the  lines  M1  and  M2  intersect  at 
the  position  q12  =  (k12,d12)  in  the  parameter  space,  which  means 
(fcf2,  d12)  are  the  parameters  of  the  line  in  the  image  space  that  runs 
through  both  image  points  px  and  p2.  The  more  lines  Mi  that  inter¬ 
sect  at  a  single  point  in  the  parameter  space,  the  more  image  space 
points  lie  on  the  corresponding  line  in  the  image!  In  general,  we  can 
state: 

If  N  lines  intersect  at  position  (kr ,dr)  in  parameter  space ,  then 
N  image  points  lie  on  the  corresponding  line  y  =  k'x  +  d!  in 
image  space. 


Fig.  8.5 

Relationship  between  image 
space  and  parameter  space. 
The  parameter  values  for  all 
possible  lines  passing  through 
the  image  point  pi  =  (xi,yi) 
in  image  space  (a)  lie  on  a 
single  line  Mi  in  parameter 
space  (b).  This  means  that 
each  point  q3  =  ( kj,dj )  in 
parameter  space  corresponds 
to  a  single  line  L  •  in  image 
space.  The  intersection  of  the 
two  lines  M1 ,  M2  at  the  point 

Q12  —  (^12  >^12)  in  parameter 
space  indicates  that  a  line  L12 
through  the  two  points  k12  and 
d12  exists  in  the  image  space. 


(a)  x/y  Image  space 


(b)  k/d  Parameter  space 


8.2.2  Accumulator  Map 

Finding  the  dominant  lines  in  the  image  can  now  be  reformulated  as 
finding  all  the  locations  in  parameter  space  where  a  significant  num¬ 
ber  of  lines  intersect.  This  is  basically  the  goal  of  the  HT.  In  order 
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8.2  The  Hough 
Transform 

Fig.  8.6 

The  accumulator  map  is  a 
discrete  representation  of  the 
parameter  space  (k,d).  For 
each  image  point  found  (a),  a 
discrete  line  in  the  parameter 
space  (b)  is  drawn.  This  oper¬ 
ation  is  performed  additively 
so  that  the  values  of  the  array 
through  which  the  line  passes 
are  incremented  by  1.  The 
value  at  each  cell  of  the  accu¬ 
mulator  array  is  the  number 
of  parameter  space  lines  that 
intersect  it  (in  this  case  2). 


to  compute  the  HT,  we  must  first  decide  on  a  discrete  representation 
of  the  continuous  parameter  space  by  selecting  an  appropriate  step 
size  for  the  k  and  d  axes.  Once  we  have  selected  step  sizes  for  the 
coordinates,  we  can  represent  the  space  naturally  using  a  2D  array. 
Since  the  array  will  be  used  to  keep  track  of  the  number  of  times 
parameter  space  lines  intersect,  it  is  called  an  “accumulator”  array. 
Each  parameter  space  line  is  painted  into  the  accumulator  array  and 
the  cells  through  which  it  passes  are  incremented,  so  that  ultimately 
each  cell  accumulates  the  total  number  of  lines  that  intersect  at  that 
cell  (Fig.  8.6). 


8.2.3  A  Better  Line  Representation 

The  line  representation  in  Eqn.  (8.1)  is  not  used  in  practice  because 
for  vertical  lines  the  slope  is  infinite,  that  is,  k  =  oo.  A  more  practi¬ 
cal  representation  is  the  so-called  Hessian  normal  form  (HNF)1  for 
representing  lines, 

x  •  cos(0)  +  y  •  sin(0)  =  r,  (8-6) 


which  does  not  exhibit  such  singularities  and  also  provides  a  natural 
linear  quantization  for  its  parameters,  the  angle  6  and  the  radius  r 
(Fig.  8.7). 

With  the  HNF  representation,  the  parameter  space  is  defined  by 
the  coordinates  0,  r,  and  a  point  p  =  (x,y)  in  image  space  corre¬ 
sponds  to  the  relation 


r(&)  =  x  •  cos(0)  +  y  •  sin(0), 


(8.7) 


for  angles  in  the  range  0  <  6  <  n  (see  Fig.  8.8).  Thus,  for  a  given 
image  point  p,  the  associated  radius  r  is  simply  a  function  of  the 
angle  6.  If  we  use  the  center  of  the  image  (of  size  M  x  AO, 


fxr\  1  (M\ 

\yr)  2  \Nj 


1  The  Hessian  normal  form  is  a  normalized  version  of  the  general  (“alge¬ 
braic”)  line  equation  Ax  +  By  +  C  =  0,  with  A  =  cos(0),  B  =  sin(0), 
and  C  =  — r  (see,  e.g.,  [35,  p.  194]). 
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Fig.  8.7 

Representation  of  lines  in  2D. 
In  the  common  k,  d  represen¬ 
tation  (a),  vertical  lines  pose 
a  problem  because  k  =  oo. 
The  Hessian  normal  form  (b) 
avoids  this  problem  by  repre¬ 
senting  a  line  by  its  angle  0 
and  distance  r  from  the  origin. 
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as  the  reference  point  for  the  x/y  image  coordinates,  then  it  is  possi¬ 
ble  to  limit  the  range  of  the  radius  to  half  the  diagonal  of  the  image, 
that  is, 

-’’max  <  r(0)  <  rmax,  with  rmax  =  \  \/M2  +  N2.  (8.9) 

We  can  see  that  the  function  r(6)  in  Eqn.  (8.7)  is  the  sum  of  a  cosine 
and  a  sine  function  on  0,  each  being  weighted  by  the  x  and  y  coordi¬ 
nates  of  the  image  point  (assumed  to  be  constant  for  the  moment). 
The  result  is  again  a  sinusoidal  function  whose  magnitude  and  phase 
depend  only  on  the  weights  (coefficients)  x,  y.  Thus,  with  the  Hes¬ 
sian  parameterization  0/r,  an  image  point  (x,y)  does  not  create  a 
straight  line  in  the  accumulator  map  A (i,j)  but  a  unique  sinusoidal 
curve,  as  shown  in  Fig.  8.8.  Again,  each  image  point  adds  a  curve  to 
the  accumulator  and  each  resulting  cluster  point  corresponds  to  to 
a  dominant  line  in  the  image  with  a  proportional  number  of  points 
on  it.2 


Fig.  8.8 

Image  space  and  parameter 
space  using  the  HNF  represen¬ 
tation.  The  image  (a)  of  size 
M  X  N  contains  four  straight 
lines  La,  .  .  .  ,  Ld.  Each  point 
on  an  image  line  creates  a 
sinusoidal  curve  in  the  0/r  pa¬ 
rameter  space  (b)  and  the  cor¬ 
responding  line  parameters  are 
indicated  by  the  clearly  visible 
cluster  points  in  the  accumula¬ 
tor  map.  The  reference  point 
x r  for  the  x/y  coordinates  lies 
at  the  center  of  the  image.  The 
line  angles  0i  are  in  the  range 
[0,  7 r)  and  the  associated  radii 
ri  are  in  [-rmax,rmax]  (the 
length  rmax  is  half  of  the  im¬ 
age  diagonal).  For  example, 
the  the  angle  0a  of  line  La  is 
approximately  7r/3,  with  the 
(positive)  radius  ra  &  0.4rmax. 
Note  that,  with  this  param¬ 
eterization,  line  Lc  has  the 
angle  6C  &  27r/3  and  the  neg¬ 
ative  radius  rc  &  —0.4 rmax. 
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2  Note  that,  in  Fig.  8.8(a),  the  positive  direction  of  the  //-coordinate  runs 
upwards  (unlike  our  usual  convention  for  image  coordinates)  to  stay 
in  line  with  the  previous  illustrations  (and  high  school  geometry).  In 
practice,  the  consequences  are  minor:  only  the  rotation  angle  runs  in 
the  opposite  direction  and  thus  the  accumulator  image  in  Fig.  8.8(b) 
was  mirrored  horizontally  for  proper  display. 


8.3  Hough  Algorithm 


8.3  Hough  Algorithm 


The  fundamental  Hough  algorithm  using  the  HNF  line  representation 
(Eqn.  (8.6))  is  given  in  Alg.  8.1.  Starting  with  a  binary  image  I(u,  v ) 
where  the  edge  pixels  have  been  assigned  a  value  of  1,  the  first  stage 
creates  a  2D  accumulator  array  and  then  iterates  over  the  image  to 
fill  it.  The  resulting  increments  are 

de  =  7r/m  and  dr  =  \/  M2  +  N2 /n  (8.10) 

for  the  angle  6  and  the  radius  r,  respectively.  The  discrete  indices  of 
the  accumulators  cells  are  denoted  %  and  j,  with  j0  =  n  A  2  as  the 
center  index  (for  r  =  0). 

For  each  relevant  image  point  (rqu),  a  sinusoidal  curve  is  added 
to  the  accumulator  map  by  stepping  over  the  discrete  angles  0i  = 
0O, . . . ,  0m_ i,  calculating  the  corresponding  radius3 

)  =  (u  —  xr)  •  cos(0i)  A  (v  —  yr)  •  sin (0J  (8.11) 

(see  Eqn.  (8.7))  and  its  discrete  index 


j  =  jo  +  round 


(8.12) 


and  subsequently  incrementing  the  accumulator  cell  A (i,j)  by  one 
(see  Alg.  8.1,  lines  10-17).  The  line  parameters  0i  and  r3  for  a  given 
accumulator  position  (i,  j)  can  be  calculated  as 


0i  =  i  •  de  and  r3  =  (j  —  jQ)  •  dr.  (8.13) 

In  the  second  stage  of  Alg.  8.1,  the  accumulator  array  is  searched 
for  local  peaks  above  a  given  minimum  Values  amin.  For  each  detected 
peak,  a  line  object  is  created  of  the  form 


-^k  {^k  ">  5  ®k)  5  (^*1^) 

consisting  of  the  angle  0fe,  the  radius  rk  (relative  to  the  reference 
point  xr),  and  the  corresponding  accumulator  value  ak.  The  resulting 
sequence  of  lines  C  =  (L1?L2, . . .)  is  then  sorted  by  descending  ak 
and  returned. 

Figure  8.9  shows  the  result  of  applying  the  Hough  transform  to  a 
very  noisy  binary  image,  which  obviously  contains  four  straight  lines. 
They  appear  clearly  as  cluster  points  in  the  corresponding  accumu¬ 
lator  map  in  Fig.  8.9  (b).  Figure  8.9  (c)  shows  the  reconstruction 
of  these  lines  from  the  extracted  parameters.  In  this  example,  the 
resolution  of  the  discrete  parameter  space  is  set  to  256  x  256. 4 

3  The  frequent  (and  expensive)  calculation  of  cos(6t)  and  sin(6t)  in  Eqn. 
(8.11)  and  Alg.  8.1  (line  15)  can  be  easily  avoided  by  initially  tabulating 
the  function  values  for  all  m  possible  angles  —  0O, . . . ,  0m_i,  which 
should  yield  a  significant  performance  gain. 

4  Note  that  drawing  a  straight  line  given  in  Hessian  normal  form  is  not 
really  a  trivial  task  (see  Excercises  8. 1-8.2  for  details). 
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Alg.  8.1 

Hough  algorithm  for  detect¬ 
ing  straight  lines.  The  algo¬ 
rithm  returns  a  sorted  list 
of  straight  lines  of  the  form 
Lk  =  (0k,  rk ,  ak)  for  the  bi¬ 
nary  input  image  /  of  size 
M  X  N.  The  resolution  of  the 
discrete  Hough  accumulator 
map  (and  thus  the  step  size  for 
the  angle  and  radius)  is  spec¬ 
ified  by  parameters  m  and  n, 
respectively.  amin  defines  the 
minimum  accumulator  value, 
that  is,  the  minimum  number 
of  image  point  on  any  detected 
line.  The  function  lsLocalMax() 
used  in  line  20  is  the  same 
as  in  Alg.  7.1  (see  p.  151). 


1: 

HoughTransformLines(/,  m,  n,  amin) 

Input:  /,  a  binary  image  of  size  M  x  A; 

m,  angular  accumulator 

steps;  n,  radial  accumulator  steps;  amin,  minimum  accumulator 

count  per  line.  Returns  a  sorted  sequence  C  =  (L1?  L2, . . .)  of  the 

most  dominant  lines  found. 

2 

(M,  N)  <-  Size(7) 

3 

(. xr,yr )  V-  1  •  ( M,N )  t>  reference  point  xr  (image  center) 

4 

de  V-  7v/m 

>  angular  step  size 

5 

dr  <-  VM2  +  N2/n 

>  radial  step  size 

6 

jo  A-  n  T  2 

>  map  index  for  r  =  0 

Step  1  -  set  up  and  fill  the  Hough  accumulator: 

7 

Create  map  A :  [0,  m—  1]  x  [0,  n—  1]  i— »•  Z 

>  accumulator 

8 

for  all  accumulator  cells  (hi)  do 

9 

A(i,  j)  -f-  0 

>  initialize  accumulator 

10 

for  all  (n,  v)  G  MxN  do 

>  scan  the  image 

11 

if  I(u,v)  >  0  then  t >  I (u 

v)  is  a  foreground  pixel 

12 

(x,y)  <-  (u-xr,v-yr) 

>  shift  to  reference 

13 

for  i  A —  0, . . . ,  m  —  1  do 

D>  angular  coordinate  i 

14 

6  i —  cIq  •  i 

>  angle,  0  <  9  <  n 

15 

r  V-  x  •  cos($)  +  y  •  sin($) 

>  see  Eqn.  8.7 

16 

j  j0  +  round (r/dr) 

D>  radial  coordinate  j 

17 

A(*,i)  <-  A (i,j)  +  1 

>  increment  A (i,j) 

Step  2  -  extract  the  most  dominant  lines: 

18 

£  u  ()  >  start  with  empty  sequence  of  lines 

19 

for  all  accumulator  cells  (i,j)  do 

>  collect  local  maxima 

20 

if  (A (i,j)  >  umin)  A  lsLocalMax(A,  i, 

j)  then 

21 

0  e —  i  •  dg 

>  angle  0 

22 

r  <-  (j  ~  j0)  '  dr 

>  radius  r 

23 

a  <-  A (i,j) 

>  accumulated  value  a 

24 

L  e —  ($,  r,  n) 

>  create  a  new  line  L 

25 

C  C  ^  (L)  >  add  line  L  to  sequence  C 

26 

Sort(T)  >  sort  C  by  descending  accumulator  count  a 

27 

return  C 

8.3.1  Processing  the  Accumulator  Array 

The  reliable  detection  and  precise  localization  of  peaks  in  the  accu¬ 
mulator  map  A (i,j)  is  not  a  trivial  problem.  As  can  readily  be  seen 
in  Fig.  8.9(b),  even  in  the  case  where  the  lines  in  the  image  are  ge¬ 
ometrically  “straight”,  the  parameter  space  curves  associated  with 
them  do  not  intersect  at  exactly  one  point  in  the  accumulator  array 
but  rather  their  intersection  points  are  distributed  within  a  small 
area.  This  is  primarily  caused  by  the  rounding  errors  introduced  by 
the  discrete  coordinate  grid  used  for  the  accumulator  array.  Since  the 
maximum  points  are  really  maximum  areas  in  the  accumulator  array, 
simply  traversing  the  array  and  returning  the  positions  of  its  largest 
values  is  not  sufficient.  Since  this  is  a  critical  step  in  the  algorithm, 
we  examine  two  different  approaches  below  (see  Fig.  8.10). 
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Fig.  8.9 

Hough  transform  for  straight 
lines.  The  dimensions  of  the 
original  image  (a)  are  360  X  240 
pixels,  so  the  maximal  radius 
(measured  from  the  image  cen¬ 
ter)  is  rmax  «  216.  For  the 
parameter  space  (b),  a  step 
size  of  256  is  used  for  both 
the  angle  6  =  0,  .  .  .  ,  7r  (hor¬ 
izontal  axis)  and  the  radius 

r  =  -GnaX>  •  •  •  ,  Gnax  (vertical 
axis).  The  four  (dark)  clusters 
in  (b)  surround  the  maximum 
values  in  the  accumulator  ar¬ 
ray,  and  their  parameters  cor¬ 
respond  to  the  four  lines  in  the 
original  image.  Intensities  are 
shown  inverted  in  all  images  to 
improve  legibility. 


Approach  A:  Thresholding 

First  the  accumulator  is  thresholded  to  the  value  of  ta  by  setting 
all  accumulator  values  A (i,j)  <  ta  to  0.  The  resulting  scattering  of 
points,  or  point  clouds,  are  first  coalesced  into  regions  (Fig.  8.10(b)) 
using  a  technique  such  as  a  morphological  closing  operation  (see  Sec. 
9.3.2).  Next  the  remaining  regions  must  be  localized,  for  instance 
using  the  region- finding  technique  from  Sec.  10.1,  and  then  each  re¬ 
gion’s  centroid  (see  Sec.  10.5)  can  be  utilized  as  the  (noninteger) 
coordinates  for  the  potential  image  space  line.  Often  the  sum  of 
the  accumulator’s  values  within  a  region  is  used  as  a  measure  of  the 
strength  (number  of  image  points)  of  the  line  it  represents. 

Approach  B:  Nonmaximum  suppression 

In  this  method,  local  maxima  in  the  accumulator  array  are  found  by 
suppressing  nonmaximal  values.5  This  is  carried  out  by  determining 
for  every  accumulator  cell  A (i,j)  whether  the  value  is  higher  than 
the  value  of  all  of  its  neighboring  cells.  If  this  is  the  case,  then 
the  value  remains  the  same;  otherwise  it  is  set  to  0  (Fig.  8.10(c)). 
The  (integer)  coordinates  of  the  remaining  peaks  are  potential  line 
parameters,  and  their  respective  heights  correlate  with  the  strength 
of  the  image  space  line  they  represent.  This  method  can  be  used 
in  conjunction  with  a  threshold  operation  to  reduce  the  number  of 
candidate  points  that  must  be  considered.  The  result  for  Fig.  8.9(a) 
is  shown  in  Fig.  8.10(d). 


5  Nonmaximum  suppression  is  also  used  in  Sec.  7.2.3  for  isolating  corner 
points. 
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Fig.  8.10 

Finding  local  maximum  val¬ 
ues  in  the  accumulator  ar¬ 
ray.  Original  distribution  of 
the  values  in  the  Hough  ac¬ 
cumulator  (a).  Variant  A: 

Threshold  operation  using  50% 
of  the  maximum  value  (b). 
The  remaining  regions  repre¬ 
sent  the  four  dominant  lines 
in  the  image,  and  the  coor¬ 
dinates  of  their  centroids  are 
a  good  approximation  to  the 
line  parameters.  Variant  B: 
Using  non-maximum  sup¬ 
pression  results  in  a  large 
number  of  local  maxima  (c) 
that  must  then  be  reduced  us¬ 
ing  a  threshold  operation  (d). 


(c) 


■ 

/ 


(d) 
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Mind  the  vertical  lines! 

Special  consideration  should  be  given  to  vertical  lines  (once  more!) 
when  processing  the  contents  of  the  accumulator  map.  The  param¬ 
eter  pairs  for  these  lines  he  near  0  =  0  and  0  =  tt  at  the  left  and 
right  borders,  respectively,  of  the  accumulator  map  (see  Fig.  8.8(b)). 
Thus,  to  locate  peak  clusters  in  this  part  of  the  parameter  space, 
the  horizontal  coordinate  along  the  0  axis  must  be  treated  circularly, 
that  is,  modulo  m.  However,  as  can  be  seen  clearly  in  Fig.  8.8(b),  the 
sinusoidal  traces  in  the  parameter  space  do  not  continue  smoothly  at 
the  transition  0  =  n  — >  0,  but  are  vertically  mirrored!  Evaluating 
such  neighborhoods  near  the  borders  of  the  parameter  space  thus 
requires  special  treatment  of  the  vertical  (r)  accumulator  coordinate. 

8.3.2  Hough  Transform  Extensions 

So  far,  we  have  presented  the  Hough  transform  only  in  its  most  basic 
formulation.  The  following  is  a  list  of  some  of  the  more  common 
methods  of  improving  and  refining  the  method. 

Modified  accumulation 

The  purpose  of  the  accumulator  map  is  to  locate  the  intersections  of 
multiple  2D  curves.  Due  to  the  discrete  nature  of  the  image  and  ac¬ 
cumulator  coordinates,  rounding  errors  usually  cause  the  parameter 
curves  not  to  intersect  in  a  single  accumulator  cell,  even  when  the 


associated  image  lines  are  exactly  straight.  A  common  remedy  is,  for 
a  given  angle  6  =  ie  -  Ae  (Alg.  8.1),  to  increment  not  only  the  main 
accumulator  cell  A(i,  j)  but  also  the  neighboring  cells  A(i,  j  —  1)  and 
A(i,j  +  1),  possibly  with  different  weights.  This  makes  the  Hough 
transform  more  tolerant  against  inaccurate  point  coordinates  and 
rounding  errors. 

Considering  edge  strength  and  orientation 

Until  now,  the  raw  data  for  the  Hough  transform  was  typically  an 
edge  map  that  was  interpreted  as  a  binary  image  with  ones  at  poten¬ 
tial  edge  points.  Yet  edge  maps  contain  additional  information,  such 
as  the  edge  strength  E(u,  v )  and  local  edge  orientation  <P(r,  v)  (see 
Sec.  6.3),  which  can  be  used  to  improve  the  results  of  the  HT. 

The  edge  strength  E(u ,  v)  is  especially  easy  to  take  into  consid¬ 
eration.  Instead  of  incrementing  visited  accumulator  cells  by  1,  add 
the  strength  of  the  respective  edge,  that  is, 

A(i,  j)  <—  A(i,  j)  +  E(u,  v).  (8.15) 

In  this  way,  strong  edge  points  will  contribute  more  to  the  accumu¬ 
lated  values  than  weak  ones  (see  also  Exercise  8.6). 

The  local  edge  orientation  v )  is  also  useful  for  limiting  the 

range  of  possible  orientation  angles  for  the  line  at  (u,v).  The  angle 
<P(r,  v)  can  be  used  to  increase  the  efficiency  of  the  algorithm  by 
reducing  the  number  of  accumulator  cells  to  be  considered  along  the 
6  axis.  Since  this  also  reduces  the  number  of  irrelevant  “votes”  in 
the  accumulator,  it  increases  the  overall  sensitivity  of  the  Hough 
transform  (see,  e.g.,  [125,  p.  483]). 

Bias  compensation 

Since  the  value  of  a  cell  in  the  Hough  accumulator  represents  the 
number  of  image  points  falling  on  a  line,  longer  lines  naturally  have 
higher  values  than  shorter  lines.  This  may  seem  like  an  obvious  point 
to  make,  but  consider  when  the  image  only  contains  a  small  section 
of  a  “long”  line.  For  instance,  if  a  line  only  passes  through  the  corner 
of  an  image  then  the  cells  representing  it  in  the  accumulator  array 
will  naturally  have  lower  values  than  a  “shorter”  line  that  lies  entirely 
within  the  image  (Fig.  8.11).  It  follows  then  that  if  we  only  search 
the  accumulator  array  for  maximal  values,  it  is  likely  that  we  will 
completely  miss  short  line  segments.  One  way  to  compensate  for 


8.3  Hough  Algorithm 


Fig.  8.11 

Hough  transform  bias  problem. 
When  an  image  represents  only 
a  finite  section  of  an  object, 
then  those  lines  nearer  the 
center  (smaller  r  values)  will 
have  higher  values  than  those 
farther  away  (larger  r  values). 
As  an  example,  the  maximum 
value  of  the  accumulator  for 
line  a  will  be  higher  than  that 
of  line  b. 
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this  inherent  bias  is  to  compute  for  each  accumulator  entry  A (i,j) 
the  maximum  number  of  image  points  Amax(i,  j)  possible  for  a  line 
with  the  corresponding  parameters  and  then  normalize  the  result,  for 
example,  in  the  form 


A(i,j)  <- 


A  (i,j) 

max(  1 ,  Amax 


(8.16) 


The  normalization  map  Amax(i,j)  can  be  determined  analytically  (by 
calculating  the  intersecting  length  of  each  line)  or  by  simulation;  for 
example,  by  computing  the  Hough  transform  of  an  image  with  the 
same  dimensions  in  which  all  pixels  are  edge  pixels  or  by  using  a 
random  image  in  which  the  pixels  are  uniformly  distributed. 


Line  endpoints 

Our  simple  version  of  the  Hough  transform  determines  the  parame¬ 
ters  of  the  line  in  the  image  but  not  their  endpoints.  These  could  be 
found  in  a  subsequent  step  by  determining  which  image  points  belong 
to  any  detected  line  (e.g.,  by  applying  a  threshold  to  the  perpendic¬ 
ular  distance  between  the  ideal  line — defined  by  its  parameters — and 
the  actual  image  points).  An  alternative  solution  is  to  calculate  the 
extreme  point  of  the  line  during  the  computation  of  the  accumulator 
array.  For  this,  every  cell  of  the  accumulator  array  is  supplemented 
with  four  addition  coordinates  to 
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A(i,j) 


(a,  u 


v 


u. 


mm 5  mriin 5  “'max)  "'max 


Vi 


(8.17) 


where  component  a  denotes  the  original  accumulator  value  and  Rmin, 
umin 5  ^max)  Anax  are  the  coordinates  of  the  line’s  bounding  box.  After 
the  additional  coordinates  are  initialized,  they  are  updated  simulta¬ 
neously  with  the  positions  along  the  parameter  trace  for  every  image 
point  (u,v).  After  completion  of  the  process,  the  accumulator  cell 
(i,j)  contains  the  bounding  box  for  all  image  points  that  contributed 
it.  When  finding  the  maximum  values  in  the  second  stage,  care  should 
be  taken  so  that  the  merged  cells  contain  the  correct  endpoints  (see 
also  Exercise  8.4). 


Hierarchical  Hough  transform 

The  accuracy  of  the  results  increases  with  the  size  of  the  parameter 
space  used;  for  example,  a  step  size  of  256  along  the  6  axis  is  equiv¬ 
alent  to  searching  for  lines  at  every  ~  0.7°.  While  increasing  the 
number  of  accumulator  cells  provides  a  finer  result,  bear  in  mind  that 
it  also  increases  the  computation  time  and  especially  the  amount  of 
memory  required. 

Instead  of  increasing  the  resolution  of  the  entire  parameter  space, 
the  idea  of  the  hierarchical  HT  is  to  gradually  “zoom”  in  and  refine 
the  parameter  space.  First,  the  regions  containing  the  most  impor¬ 
tant  lines  are  found  using  a  relatively  low-resolution  parameter  space, 
and  then  the  parameter  spaces  of  those  regions  are  recursively  passed 
to  the  HT  and  examined  at  a  higher  resolution.  In  this  way,  a  rel¬ 
atively  exact  determination  of  the  parameters  can  be  found  using  a 
limited  (in  comparison)  parameter  space. 


Line  intersections 

It  may  be  useful  in  certain  applications  not  to  find  the  lines  them¬ 
selves  but  their  intersections,  for  example,  for  precisely  locating  the 
corner  points  of  a  polygon-shaped  object.  The  Hough  transform  de¬ 
livers  the  parameters  of  the  recovered  lines  in  Hessian  normal  form 
(that  is,  as  pairs  Lk  =  (0k,rk)).  To  compute  the  point  of  intersection 
xi2  =  ( xi2iVi2y  f°r  two  lines  L1  =  (0llr1)  and  L2  =  (#2,r2)  we 
need  to  solve  the  system  of  linear  equations 

x12  ■  cos(6»!)  +  y12  -sin^j)  =  r1, 

(Q  \  I  •  (Q  \  v^' 

x12  •  cos (02)  +  y12  ■  sin (02)  =  r2, 


for  the  unknowns  x12,y12.  The  solution  is 


fa  nA  =  _ 1 _ /r1sin(<92)-r2sin(6>1)\ 

\Vi2 )  cos (6^)  sin(02)  —  cos(02)  sin(6>1)  \r2  cos (01)—r1  cos (02) J 


1  _  f r1sin(02)-r2sm(01)\ 

sin(6>2-6>1)  ^r2  cos(^1)-r1  cos(6>2)y  ’ 


(8.19) 


for  sin(^2  —0 x)  ^0.  Obviously  x0  is  undefined  (no  intersection  point 
exists)  if  the  lines  T1,L2  are  parallel  to  each  other  (i.e.,  if  61  =  02). 

Figure  8.12  shows  an  illustrative  example  using  ARToolkit 6  mark¬ 
ers.  After  automatic  thresholding  (see  Ch.  11)  the  straight  line  seg¬ 
ments  along  the  outer  boundary  of  the  largest  binary  region  are  an¬ 
alyzed  with  the  Hough  transform.  Subsequently,  the  corners  of  the 
marker  are  calculated  precisely  as  the  intersection  points  of  the  in¬ 
volved  line  segments. 


8.4  Java  Implementation 

The  complete  Java  source  code  for  the  straight  line  Hough  transform 
is  available  online  in  class  HoughTransf  ormLines.7  Detailed  usage  of 
this  class  is  shown  in  the  ImageJ  plugin  Find_Straight_Lines  (see 
also  Prog.  8.1  for  a  minimal  example).8 

HoughTransf ormLines  (class) 

This  class  is  a  direct  implementation  of  the  Hough  transform  for 
straight  lines,  as  outlined  in  Alg.  8.1.  The  sin/cos  function  calls  (see 
Alg.  8.1,  line  15)  are  substituted  by  precalculated  tables  for  improved 
efficiency.  The  class  defines  the  following  constructors: 

HoughTransf ormLines  (ImageProcessor  I,  Parameters 
params) 

I  denotes  the  input  image,  where  all  pixel  values  >  0  are 
assumed  to  be  relevant  (edge)  points;  params  is  an  instance  of 
the  (inner)  class  HoughTransf  ormLines  .Parameters,  which 
allows  to  specify  the  accumulator  size  (nAng,  nRad)  etc. 

6  Used  for  augmented  reality  applications,  see  www.hitl.washington.edu/ 
artoolkit/. 

1-7 

Package  imagingbook . pub . hough. 

8  Note  that  the  current  implementation  has  no  bias  compensation  (see 
Sec.  8.3.2,  Fig.  8.11). 
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Fig.  8.12 

Hough  transform  used  for 
precise  calculation  of  corner 
points.  Original  image  showing 
a  typical  ARToolkit  marker 
(a),  result  after  automatic 
thresholding  (b).  The  outer 
contour  pixels  of  the  largest 
binary  region  (c)  are  used  as 
input  points  to  the  Hough 
transform.  Hough  accumulator 
map  (d),  detected  lines  and 
marked  intersection  points  (e). 


HoughTransf ormLines  (Point 2D  []  points,  int  M,  int  N, 
Parameters  params) 

In  this  case  the  Hough  transform  is  calculated  for  a  sequence 
of  2D  points  (points);  M,  N  specify  the  associated  coordinate 
frame  (for  calculating  the  reference  point  ccr),  which  is 
typically  the  original  image  size;  params  is  a  parameter  object 
(as  described  before). 

The  most  important  public  methods  of  the  class  ClassHoughTrans- 
formLines  are: 

HoughLine []  getLines  (int  amin,  int  maxLines) 

Returns  a  sorted  sequence  of  line  objects9  whose  accumulator 
value  is  amin  or  greater.  The  sequence  is  sorted  by  accumula¬ 
tor  values  and  contains  up  to  maxLines  elements 

int  []  []  get  Accumulator  () 

Returns  a  reference  to  the  accumulator  map  A  (of  size  m  x  n 
for  angles  and  radii,  respectively). 


Of  type  HoughTransf ormLines . HoughLine. 
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1  import  imagingbook . . .  . HoughTransf ormLines ; 

2  import  imagingbook...  . HoughTransf ormLines . HoughLine ; 

3  import  imagingbook...  . HoughTransf ormLines . Parameters ; 

4  .  .  . 

5 

6  public  void  run(ImageProcessor  ip)  { 

7  Parameters  params  =  new  Parameters () ; 

8  params  .  nAng  =  256 ;  II  —  m 

9  params  .  nRad  =  256 ;  //  =  n 

10 

11  //  compute  the  Hough  Transform: 

12  HoughTransf ormLines  ht  = 

13  new  HoughTransf ormLines (ip ,  params); 

14 

15  //  retrieve  the  5  strongest  lines  with  min.  50  accumulator  votes 

16  HoughLine []  lines  =  ht . getLines (50 ,  5); 

17 

18  if  (lines . length  >  0)  { 

19  IJ . log( "Lines  found:"); 

20  for  (HoughLine  L  :  lines)  { 

21  I J .  log(L .  toStringO ) ;  //  list  the  resulting  lines 

22  } 

23  } 

24  else 

25  IJ.logC'No  lines  found!"); 

26  } 


8.4  Java 
Implementation 

Prog.  8.1 

Minimal  example  for  the  usage 
of  class  HoughTransf ormLines 
(runO  method  for  an  Image  J 
plugin  of  type  PluglnFilter). 
First  (in  lines  7—9)  a  parameter 
ob  ect  is  created  and  config¬ 
ured;  nAng  (=  m)  and  nRad 
(=  n)  specify  the  number  of 
discrete  angular  and  radial 
steps  in  the  Hough  accumula¬ 
tor  map.  In  lines  12-13  an  in¬ 
stance  of  HoughTransf ormLines 
is  created  for  the  image  ip. 

The  accumulator  map  is  cal¬ 
culated  in  this  step.  In  line 
16,  getLines ()  is  called  to  re¬ 
trieve  the  sequence  of  the  5 
strongest  detected  lines,  with 
at  least  50  image  points  each. 
Unless  empty,  this  sequence  is 
subsequently  listed. 


int  []  []  getAccumulatorMax  () 

Returns  a  copy  of  accumulator  array  in  which  all  non-maxima 
are  replaced  by  zero  values. 

FloatProcessor  getAccumulatorlmage  () 

Returns  a  floating-point  image  of  the  accumulator  array,  anal¬ 
ogous  to  getAccumulator  () .  Angles  6j  run  horizontally,  radii 
Tj  vertically. 

FloatProcessor  getAccumulatorMaxImage  () 

Returns  a  floating-point  image  of  the  accumulator  array  with 
suppressed  non-maximum  values,  analogous  to  getAccumu¬ 
latorMax  ()  . 


double  angleFromlndex  (int  i) 

Returns  the  angle  E  [0, 7 r)  for  the  given  index  i  in  the  range 

0, . . . ,  m  —  1. 


double  radiusFromlndex  (int  j) 

Returns  the  radius  r3  E  [-rmax)  rmax 
the  range  0, . . . ,  n  —  1. 


for  the  given  index  j  in 


Point2D  getRef erencePoint  () 

Returns  the  (fixed)  reference  point  xr  for  this  Hough  transform 
instance. 
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HoughLine  (class) 


HoughLine  represents  a  straight  line  in  Hessian  normal  form.  It  is 
implemented  as  an  inner  class  of  HoughTransf  ormLines.  It  offers  no 
public  constructor  but  the  following  methods: 
double  get Angle  () 

Returns  the  angle  6  E  [0,  tt)  of  this  line. 


double  getRadius  () 

Returns  the  radius  r  E 


_ ry*  ry* 

1  max?  '  max 


of  this  line,  relative  to 


the  associated  Hough  transform’s  reference  point  x 


ryt 


int  getCount  () 

Returns  the  Hough  transform’s  accumulator  value  (number  of 
registered  image  points)  for  this  line. 

Point2D  getRef erencePoint  () 

Returns  the  (fixed)  reference  point  xr  for  this  line.  Note  that 
all  lines  associated  with  a  given  Hough  transform  share  the 
same  reference  point. 


double  getDistance  (Point2D  p) 

Returns  the  Euclidean  distance  of  point  p  to  this  line.  The 
result  may  be  positive  or  negative,  depending  on  which  side  of 
the  line  p  is  located. 


8.5  Hough  Transform  for  Circles  and  Ellipses 

8.5.1  Circles  and  Arcs 

Since  lines  in  2D  have  two  degrees  of  freedom,  they  could  be  com¬ 
pletely  specified  using  two  real- valued  parameters.  In  a  similar  fash¬ 
ion,  representing  a  circle  in  2D  requires  three  parameters,  for  example 


C  =  { x,y,r }, 

where  x,  y  are  the  coordinates  of  the  center  and  p  is  the  radius  of 
the  circle  (Fig.  8.13). 

Fig.  8.13 

Representation  of  circles  and 
ellipses  in  2D.  A  circle  (a) 
requires  three  parameters 
(e.g.,  x,y,r).  An  arbitrary 
ellipse  (b)  takes  five  param¬ 
eters  (e.g.,  x,  y,  rOJ  rh,  a). 


A  point  p  =  (x,  y)  lies  exactly  on  the  circle  C  if  the  condition 

(x  —  x)2  +  (x  —  y )2  =  r2  (8.20) 


y 


V  =  (x,  y ) 


(a) 


y 
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holds.  Therefore  the  Hough  transform  for  circles  requires  a  3D  pa¬ 
rameter  space  A (i,  j,  k)  to  find  the  position  and  radius  of  circles  (and 


circular  arcs)  in  an  image.  Unlike  the  HT  for  lines,  there  does  not  ex¬ 
ist  a  simple  functional  dependency  between  the  coordinates  in  param¬ 
eter  space;  so  how  can  we  find  every  parameter  combination  (x,  y ,  r) 
that  satisfies  Eqn.  (8.20)  for  a  given  image  point  (x,  y)?  A  “brute 
force”  is  to  a  exhaustively  test  all  cells  of  the  parameter  space  to  see 
if  the  relation  in  Eqn.  (8.20)  holds,  which  is  computationally  quite 
expensive,  of  course. 

If  we  examine  Fig.  8.14,  we  can  see  that  a  better  idea  might  be 
to  make  use  of  the  fact  that  the  coordinates  of  the  center  points  also 
form  a  circle  in  Hough  space.  It  is  not  necessary  therefore  to  search 
the  entire  3D  parameter  space  for  each  image  point.  Instead  we  need 
only  increase  the  cell  values  along  the  edge  of  the  appropriate  circle 
on  each  r  plane  of  the  accumulator  array.  To  do  this,  we  can  adapt 
any  of  the  standard  algorithms  for  generating  circles.  In  this  case, 
the  integer  math  version  of  the  well-known  Bresenham  algorithm  [33] 
is  particularly  well-suited. 


Figure  8.15  shows  the  spatial  structure  of  the  3D  parameter  space 
for  circles.  For  a  given  image  point  =  (Rm,xm),  at  each  plane 
along  the  r  axis  (for  rk  =  rmin, . . . ,  rmax),  a  circle  centered  at 
(urnlvrn)  with  the  radius  rk  is  traversed,  ultimately  creating  a  3D 
cone-shaped  surface  in  the  parameter  space.  The  coordinates  of  the 
dominant  circles  can  be  found  by  searching  the  accumulator  space 
for  the  cells  with  the  highest  values;  that  is,  the  cells  where  the  most 
cones  intersect.  Just  as  in  the  linear  HT,  the  bias  problem  (see  Sec. 
8.3.2)  also  occurs  in  the  circle  HT.  Sections  of  circles  (i.e. ,  arcs)  can 
be  found  in  a  similar  way,  in  which  case  the  maximum  value  possible 
for  a  given  cell  is  proportional  to  the  arc  length. 

8.5.2  Ellipses 

In  a  perspective  image,  most  circular  objects  originating  in  our  real, 
3D  world  will  actually  appear  in  2D  images  as  ellipses,  except  in  the 
case  where  the  object  lies  on  the  optical  axis  and  is  observed  from 
the  front.  For  this  reason,  perfectly  circular  structures  seldom  occur 


8.5  Hough  Transform 
for  Circles  and 
Ellipses 


Fig.  8.14 

Hough  transform  for  circles. 
The  illustration  depicts  a  sin¬ 
gle  slice  of  the  3D  accumula¬ 
tor  array  A (i,  j,  k)  at  a  given 
circle  radius  rk.  The  center 
points  of  all  the  circles  running 
through  a  given  image  point 
Pi  =  (xiiVi)  f°rm  a  circle  C1 
with  a  radius  of  rk  centered 
around  pl5  just  as  the  cen¬ 
ter  points  of  the  circles  that 
pass  through  p2  and  p3  lie  on 
the  circles  C2,  C3.  The  cells 
along  the  edges  of  the  three 
circles  C1,  C2 ,  C 3  of  radius  rk 
are  traversed  and  their  val¬ 
ues  in  the  accumulator  array 
incremented.  The  cell  in  the 
accumulator  array  contains 
a  value  of  3  where  the  circles 
intersect  at  the  true  center  of 
the  image  circle  C. 
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Fig.  8.15 

3D  parameter  space  for  cir¬ 
cles.  For  each  image  point 
p  =  (u,v),  the  cells  lying 
on  a  cone  (with  its  axis  at 
(u,  v)  and  varying  radius 
rk)  in  the  3D  accumulator 
A (i,j,  k )  are  traversed  and 
incremented.  The  size  of  the 
discrete  accumulator  is  set  to 
100x100x30.  Candidate  center 
points  are  found  where  many 
of  the  3D  surfaces  intersect. 


3D  parameter  space: 
Xu  yj  =  0,  .  .  .  ,  100 

rk  =  10,  .  .  .  ,  30 

Image  points  pm: 

Pi  =  (30,50) 
p  =  (50,  50) 
p3  =  (40,  40) 

p4  =  (80, 20) 


in  photographs.  While  the  Hough  transform  can  still  be  used  to  find 
ellipses,  the  larger  parameter  space  required  makes  it  substantially 
more  expensive. 

A  general  ellipse  in  2D  has  five  degrees  of  freedom  and  therefore 
requires  five  parameters  to  represent  it,  for  example, 

E  (t,  y ,  ra,  1*5,  ct) ,  (8.21) 

where  (x,y)  are  the  coordinates  of  the  center  points,  (ra,rb)  are  the 
two  radii,  and  a  is  the  orientation  of  the  principal  axis  (Fig.  8. 13). 10 
In  order  to  find  ellipses  of  any  size,  position,  and  orientation  using  the 
Hough  transform,  a  5D  parameter  space  with  a  suitable  resolution  in 
each  dimension  is  required.  A  simple  calculation  illustrates  the  enor¬ 
mous  expense  of  representing  this  space:  using  a  resolution  of  only 
128  =  27  steps  in  every  dimension  results  in  235  accumulator  cells, 
and  implementing  these  using  4-byte  int  values  thus  requires  237 
bytes  (128  gigabytes)  of  memory.  Moreover,  the  amount  of  process¬ 
ing  required  for  filling  and  evaluating  such  a  huge  parameter  space 
makes  this  method  unattractive  for  real  applications. 

An  interesting  alternative  in  this  case  is  the  generalized  Hough 
transform ,  which  in  principle  can  be  used  for  detecting  any  arbitrary 
2D  shape  [15,117].  Using  the  generalized  Hough  transform,  the  shape 
of  the  sought-after  contour  is  first  encoded  point  by  point  in  a  table 
and  then  the  associated  parameter  space  is  related  to  the  position 
(^c,2/c),  scale  5,  and  orientation  6  of  the  shape.  This  requires  a  4D 
space,  which  is  smaller  than  that  of  the  Hough  method  for  ellipses 
described  earlier. 
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See  Chapter  10,  Eqn.  (10.39)  for  a  parametric  equation  of  this  ellipse. 
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8.6  Exercises 


Exercise  8.1.  Drawing  a  straight  line  given  in  Hessian  normal  (HNF) 
form  is  not  directly  possible  because  typical  graphics  environments 
can  only  draw  lines  between  two  specified  end  points.11  An  HNF  line 
L  =  (0,  r),  specified  relative  to  a  reference  point  xr  =  (xr,yr),  can 
be  drawn  into  an  image  I  in  several  ways  (implement  both  versions): 

Version  1:  Iterate  over  all  image  points  (r,  u);  if  Eqn.  (8.11),  that  is, 

r  =  (u  —  xr)  •  cos(0)  +  (v  —  yr)  •  sin(0),  (8.22) 

is  satisfied  for  position  (u,v),  then  mark  the  pixel  I(u,v).  Of 
course,  this  “brute  force”  method  will  only  show  those  (few)  line 
pixels  whose  positions  satisfy  the  line  equation  exactly.  To  ob¬ 
tain  a  more  “tolerant”  drawing  method,  we  first  reformulate  Eqn. 
(8.22)  to 

(u  —  xr)  •  cos(0)  +  (v  —  yr)  •  sin(0)  —  r  =  d.  (8.23) 


Obviously,  Eqn.  (8.22)  is  only  then  exactly  satisfied  if  d  =  0 
in  Eqn.  (8.23).  If,  however,  Eqn.  (8.22)  is  not  satisfied,  then  the 
magnitude  of  d  ^  0  equals  the  distance  of  the  point  (r,  v)  from  the 
line.  Note  that  d  itself  may  be  positive  or  negative,  depending  on 
which  side  of  the  line  (r,  v)  is  located.  This  suggests  the  following 
version. 

Version  2:  Define  a  constant  w  >  0.  Iterate  over  all  image  positions 
(u,v);  whenever  the  inequality 


(u  —  xr)  •  cos(0)  +  (v  —  yr)  •  sin(0) 


r 


<  w 


(8.24) 


is  satisfied  for  position  (u,  v),  mark  the  pixel  /(r,  v).  For  example, 
all  line  points  should  show  with  w  =  1.  What  is  the  geometric 
meaning  of  wl 

Exercise  8.2.  Develop  a  less  “brutal”  method  (compared  to  Exercise 
8.1)  for  drawing  a  straight  line  L  =  (0,  r)  in  Hessian  normal  form 
(HNF).  First,  set  up  the  HNF  equations  for  the  four  border  lines 
of  the  image,  A ,  £>,  (7,  D.  Now  determine  the  intersection  points  of 
the  given  line  L  with  each  border  line  A, . . . ,  D  and  use  the  built- 
in  drawLineO  method  or  a  similar  routine  to  draw  L  by  connecting 
the  intersection  points.  Consider  which  special  situations  may  appear 
and  how  they  could  be  handled. 

Exercise  8.3.  Implement  (or  extend)  the  Hough  transform  for 
straight  lines  by  including  measures  against  the  bias  problem,  as 
discussed  in  Sec.  8.3.2  (Eqn.  (8.16)). 

Exercise  8.4.  Implement  (or  extend)  the  Hough  transform  for  find¬ 
ing  lines  that  takes  into  account  line  endpoints,  as  described  in  Sec. 
8.3.2  (Eqn.  (8.17)). 

Exercise  8.5.  Calculate  the  pairwise  intersection  points  of  all  de¬ 
tected  lines  (see  Eqns.  (8.18)-(8.19))  and  show  the  results  graphi¬ 
cally. 


li 


For  example,  with  drawLine  (xl ,  yl,  x2,  y2)  in  ImageJ. 
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Exercise  8.6.  Extend  the  Hough  transform  for  straight  lines  so  that 
updating  the  accumulator  map  takes  into  account  the  intensity  (edge 
magnitude)  of  the  current  pixel,  as  described  in  Eqn.  (8.15). 

Exercise  8.7.  Implement  a  hierarchical  Hough  transform  for  straight 
lines  (see  p.  172)  capable  of  accurately  determining  line  parameters. 

Exercise  8.8.  Implement  the  Hough  transform  for  finding  circles 
and  circular  arcs  with  varying  radii.  Make  use  of  a  fast  algorithm  for 
drawing  circles  in  the  accumulator  array,  such  as  described  in  Sec. 
8.5. 
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Morphological  Filters 


In  the  discussion  of  the  median  filter  in  Chapter  5  (Sec.  5.4.2),  we 
noticed  that  this  type  of  filter  can  somehow  alter  2D  image  structures. 
Figure  9.1  illustrates  once  more  how  corners  are  rounded  off,  holes  of 
a  certain  size  are  filled,  and  small  structures,  such  as  single  dots  or 
thin  lines,  are  removed.  The  median  filter  thus  responds  selectively  to 
the  local  shape  of  image  structures,  a  property  that  might  be  useful 
for  other  purposes  if  it  can  be  applied  not  just  randomly  but  in  a 
controlled  fashion.  Altering  the  local  structure  in  a  predictable  way 
is  exactly  what  “morphological”  filters  can  do,  which  we  focus  on  in 
this  chapter. 


Fig.  9.1 

Median  filter  applied  to  a  bi¬ 
nary  image:  original  image  (a) 
and  results  from  a  3  X  3  pixel 
median  filter  (b)  and  a  5  X  5 
pixel  median  filter  (c). 


(a) 


(b) 


(c) 


In  their  original  form,  morphological  filters  are  aimed  at  binary 
images,  images  with  only  two  possible  pixel  values,  0  and  1  or  black 
and  white ,  respectively.  Binary  images  are  found  in  many  places, 
in  particular  in  digital  printing,  document  transmission  (FAX)  and 
storage,  or  as  selection  masks  in  image  and  video  editing.  Binary 
images  can  be  obtained  from  grayscale  images  by  simple  thresholding 
(see  Sec.  4.1.4)  using  either  a  global  or  a  locally  varying  threshold 
value.  We  denote  binary  pixels  with  values  1  and  0  as  foreground  and 
background  pixels,  respectively.  In  most  of  the  following  examples, 
the  foreground  pixels  are  shown  in  black  and  background  pixels  are 
shown  in  white,  as  is  common  in  printing. 

At  the  end  of  this  chapter,  we  will  see  that  morphological  filters 
are  applicable  not  only  to  binary  images  but  also  to  grayscale  and  ^ISl 
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Fig.  9.2 

Basic  idea  of  size-dependent 
removal  of  image  structures. 
Small  structures  may  be  elim¬ 
inated  by  iterative  shrink¬ 
ing  and  subsequent  grow¬ 
ing.  Ideally,  the  “surviv¬ 
ing”  structures  should  be  re¬ 
stored  to  their  original  shape. 


even  color  images,  though  these  operations  differ  significantly  from 
their  binary  counterparts. 


9.1  Shrink  and  Let  Grow 

Our  starting  point  was  the  observation  that  a  simple  3x3  pixel  me¬ 
dian  filter  can  round  off  larger  image  structures  and  remove  smaller 
structures,  such  as  points  and  thin  lines,  in  a  binary  image.  This 
could  be  useful  to  eliminate  structures  that  are  below  a  certain  size 
(e.g.,  to  clean  an  image  from  noise  or  dirt).  But  how  can  we  control 
the  size  and  possibly  the  shape  of  the  structures  affected  by  such  an 
operation? 

Although  its  structural  effects  may  be  interesting,  we  disregard 
the  median  filter  at  this  point  and  start  with  this  task  again  from 
the  beginning.  Let’s  assume  that  we  want  to  remove  small  struc¬ 
tures  from  a  binary  image  without  significantly  altering  the  remain¬ 
ing  larger  structures.  The  key  idea  for  accomplishing  this  could  be 
the  following  (Fig.  9.2): 

1.  First,  all  structures  in  the  image  are  iteratively  “shrunk”  by  peel¬ 
ing  off  a  layer  of  a  certain  thickness  around  the  boundaries. 

2.  Shrinking  removes  the  smaller  structures  step  by  step,  and  only 
the  larger  structures  remain. 

3.  The  remaining  structures  are  then  grown  back  by  the  same 
amount. 

4.  Eventually  the  larger  regions  should  have  returned  to  approxi¬ 
mately  their  original  shapes,  while  the  smaller  regions  have  dis¬ 
appeared  from  the  image. 

All  we  need  for  this  are  two  types  of  operations.  “Shrinking”  means 
to  remove  a  layer  of  pixels  from  a  foreground  region  around  all  its 
borders  against  the  background  (Fig.  9.3).  The  other  way  around, 
“growing”,  adds  a  layer  of  pixels  around  the  border  of  a  foreground 
region  (Fig.  9.4). 
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(a)  (b)  (c) 


Fig.  9.3 

“Shrinking”  a  foreground  re¬ 
gion  by  removing  a  layer  of 
border  pixels:  original  im¬ 
age  (a),  identified  foreground 
pixels  that  are  in  direct  con¬ 
tact  with  the  background  (b), 
and  result  after  shrinking  (c). 

Fig.  9.4 

“Growing”  a  foreground  re¬ 
gion  by  attaching  a  layer  of 
pixels:  original  image  (a),  iden¬ 
tified  background  pixels  that 
are  in  direct  contact  with  the 
region  (b),  and  result  after 
growing  (c). 


9.1.1  Neighborhood  of  Pixels 

For  both  operations,  we  must  define  the  meaning  of  two  pixels  being 
adjacent  (i.e.,  being  “neighbors”).  Two  definitions  of  “neighborhood” 
are  commonly  used  for  rectangular  pixel  grids  (Fig.  9.5): 

•  4- neighborhood  (A/4):  the  four  pixels  adjacent  to  a  given  pixel 
in  the  horizontal  and  vertical  directions; 

•  8- neighborhood  (A /§):  the  pixels  contained  in  A/4  plus  the  four 
adjacent  pixels  along  the  diagonals. 


V 

n2 

X 

N0 

Ns 

N3 

n2 

V 

n4 

X 

N0 

n5 

No 

n7 

Fig.  9.5 

Definitions  of  “neighbor¬ 
hood”  on  a  rectangular 
pixel  grid:  4~ neighborhood 
A/*4  =  { N1 ,  .  .  .  ,  Na}  and 
8-neighborhood  J\f8  = 

•V4  U  {JV5,  .  .  .  ,  Ns}. 


9.2  Basic  Morphological  Operations 

Shrinking  and  growing  are  indeed  the  two  most  basic  morphological 
operations,  which  are  referred  to  as  “erosion”  and  “dilation”,  respec¬ 
tively.  These  morphological  operations,  however,  are  much  more  gen¬ 
eral  than  illustrated  in  the  example  in  Sec.  9.1.  They  go  well  beyond 
removing  or  attaching  single  pixel  layers  and — in  combination — can 
perform  much  more  complex  operations. 

9.2.1  The  Structuring  Element 

Similar  to  the  coefficient  matrix  of  a  linear  filter  (see  Sec.  5.2),  the 
properties  of  a  morphological  filter  are  specified  by  elements  in  a  ma¬ 
trix  called  a  “structuring  element”.  In  binary  morphology,  the  struc¬ 
turing  element  (just  like  the  image  itself)  contains  only  the  values  0 
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Fig.  9.6 

Binary  structuring  ele¬ 
ment  (example).  1— elements 
are  marked  with  •;  0— cells 
are  empty.  The  hot  spot 
(boxed)  is  not  necessar¬ 
ily  located  at  the  center. 


Fig.  9.7 

A  binary  image  I  or  a  struc¬ 
turing  element  H  can  each 
be  described  as  a  set  of  co¬ 
ordinate  pairs,  Qj  and  QH , 
respectively.  The  dark  shaded 
element  in  H  marks  the  co¬ 
ordinate  origin  (hot  spot). 
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and  1, 


G  {0, 1}, 


and  the  hot  spot  marks  the  origin  of  the  coordinate  system  of  H  (Fig. 
9.6).  Notice  that  the  hot  spot  is  not  necessarily  located  at  the  center 
of  the  structuring  element,  nor  must  its  value  be  1. 


origin  (hot  spot) 


9.2.2  Point  Sets 

For  the  formal  specification  of  morphological  operations,  it  is  some¬ 
times  helpful  to  describe  binary  images  as  sets  of  2D  coordinate 
points.1 

For  a  binary  image  I(u,v)  G  {0, 1},  the  corresponding  point  set 
Qj  consists  of  the  coordinate  pairs  p  =  (r,  v)  of  all  foreground  pixels, 

Qi  =  {p\  I(p )  =  !}•  (9-1) 

Of  course,  as  shown  in  Fig.  9.7,  not  only  a  binary  image  I  but  also  a 
structuring  element  H  can  be  described  as  a  point  set. 

H 

-10  1 
-l 

o 


I=Qi  =  {(1,1),  (2,1),  (2,  2)}  H=Qh  =  {(0,0),  (1,0)} 


I 

0  12  3 

1  •  • 

2  • 


With  the  description  as  point  sets,  fundamental  operations  on 
binary  images  can  also  be  expressed  as  simple  set  operations.  For 
example,  inverting  a  binary  image  I  I  (i.e. ,  exchanging  foreground 
and  background)  is  equivalent  to  building  the  complementary  set 


Q 


i 


{pel? 


P  f-  Qi}- 


Combining  two  binary  images  Ix  and  /2  by  an  OR  operation  between 
corresponding  pixels,  the  resulting  point  set  is  the  union  of  the  indi¬ 
vidual  point  sets  Qj  and  QIt? ;  that  is, 

Qi1wi2  =  Qix  U  Qi2-  (9.3) 

Since  a  point  set  Qj  is  only  an  alternative  representation  of  the  bi¬ 
nary  image  I  (i.e.,  I  =  Qj),  we  will  use  both  image  and  set  notations 
synonymously  in  the  following.  For  example,  we  simply  write  I  in¬ 
stead  of  Qj  for  an  inverted  image  as  in  Eqn.  (9.2)  or  U  I2  instead 
of  Qj  U  QIo  in  Eqn.  (9.3).  The  meaning  should  always  be  clear  in 
the  given  context. 

1  Morphology  is  a  mathematical  discipline  dealing  with  the  algebraic  anal¬ 
ysis  of  geometrical  structures  and  shapes,  with  strong  roots  in  set  theory. 
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which  is  equivalent  to  changing  the  coordinates  of  the  original  point 
set  in  the  form 

Id  =  {(p  +  d)  |  P  £  /}•  (9-5) 

In  some  cases,  it  is  also  necessary  to  reflect  (mirror)  a  binary  image 
or  point  set  about  its  origin,  which  we  denote  as 


Translating  (shifting)  a  binary  image  /  by  some  coordinate  vector 
d  creates  a  new  image  with  the  content 


I  dip  +  d)  =  I  (p) 


oder 


i dip)  =  Up  -  d), 


(9.4) 


I*  =  {~P 


P  €  /}. 


9.2.3  Dilation 

A  dilation  is  the  morphological  operation  that  corresponds  to  our  in¬ 
tuitive  concept  of  “growing”  as  discussed  already.  As  a  set  operation, 
it  is  defined  as 


/©if  =  {(p  +  q)  |  for  allp  G  flq  G  if}.  (9.7) 

Thus  the  point  set  produced  by  a  dilation  is  the  (vector)  sum  of  all 
possible  pairs  of  coordinate  points  from  the  original  sets  I  and  if, 
as  illustrated  by  a  simple  example  in  Fig.  9.8.  Alternatively,  one 
could  view  the  dilation  as  the  structuring  element  if  being  replicated 
at  each  foreground  pixel  of  the  image  I  or,  conversely,  the  image  I 
being  replicated  at  each  foreground  element  of  if .  Expressed  in  set 
notation,2  this  is 


I®H=  \jHp=  (J  Iq,  (9.8) 

pGl  q£H 

with  Hp,Iq  denoting  the  sets  if,  /  shifted  by  p  and  q ,  respectively 
(see  Eqn.  (9.5)). 


Fig.  9.8 

Binary  dilation  example.  The 
binary  image  I  is  subject  to 
dilation  with  the  structuring 
element  H .  In  the  result  I  ©  H 
the  structuring  element  H  is 
replicated  at  every  foreground 
pixel  of  the  original  image  I . 

I  =  {(1, 1),  (2, 1),  (2,  2)},  H  =  {(0,  0),  (1,  0)} 
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0  12  3 

Of 

=  1  •  •  • 
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/©if  =  {  (1, 1)  +  (0,  0),  (1, 1)  +  (1,  0), 
(2,1) +  (0,0),  (2,1) +  (1,0), 
(2,2) +  (0,0),  (2,  2) +  (1,0)} 


See  also  Sec.  A. 2  in  the  Appendix. 
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9.2.4  Erosion 
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Fig.  9.9 

Binary  erosion  example.  The 
binary  image  /  is  subject  to 
erosion  with  H  as  the  structur¬ 
ing  element.  H  is  only  covered 
by  I  when  placed  at  position 
p  =  (1,  1),  thus  the  result¬ 
ing  points  set  contains  only 
the  single  coordinate  (1,  1). 


The  quasi-inverse  of  dilation  is  the  erosion  operation,  again  defined 
in  set  notation  as 


f  ©  if  =  {p  G  Z2  |  (p  +  q)  G  f,  for  all  q  G  if}.  (9.9) 


This  operation  can  be  interpreted  as  follows.  A  position  p  is  con¬ 
tained  in  the  result  f  ©  if  if  (and  only  if)  the  structuring  element 
if — when  placed  at  this  position  p — is  fully  contained  in  the  fore¬ 
ground  pixels  of  the  original  image;  that  is,  if  Hp  is  a  subset  of  I. 
Equivalent  to  Eqn.  (9.9),  we  could  thus  define  binary  erosion  as 


leH  =  {peI? 


Hp  C  /}. 


(9.10) 


Figure  9.9  shows  a  simple  example  for  binary  erosion. 
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f  =  {(1, 1),  (2, 1),  (2,  2)},  if  =  {(0,  0),  (1,  0)} 
f  ©  if  =  {  (1, 1)  }  because 

(l,l)  +  (0,0)  =  (l,l)Gf  and  (1, 1)  +  (1,  0)  =  (2, 1)  G  f 


9.2.5  Formal  Properties  of  Dilation  and  Erosion 

The  dilation  operation  is  commutative , 

f©if  =  if©f,  (9.11) 

and  therefore — just  as  in  linear  convolution — the  image  and  the  struc¬ 
turing  element  (filter)  can  be  exchanged  to  get  the  same  result.  Di¬ 
lation  is  also  associative ,  that  is, 

(A  ©  I 2)  ©  ^3  =  h  ©  (I 2  ©  ^3)5  (9.12) 

and  therefore  the  ordering  of  multiple  dilations  is  not  relevant.  This 
also  means — analogous  to  linear  filters  (cf.  Eqn.  (5.25)) — that  a  dila¬ 
tion  with  a  large  structuring  element  of  the  form  ifbig  =  if x  ®  if 2  ® 
. . .  ©  Hk  can  be  efficiently  implemented  as  a  sequence  of  multiple 
dilations  with  smaller  structuring  elements  by 

f  ©  ifbig  =  (•  •  •  ((-^  ©  f^i)  ©  H2)  ©  ...  ©  ifjv)  (9.13) 

There  is  also  a  neutral  element  (5)  for  the  dilation  operation,  similar 
to  the  Dirac  function  for  the  linear  convolution  (see  Sec.  5.3.4), 
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f©5  =  5©f  =  f,  with  S  =  {(0, 0)}. 


(9.14) 


The  erosion  operation  is,  in  contrast  to  dilation  (but  similar  to 
arithmetic  subtraction),  not  commutative,  that  is, 


IQH^HQI, 


(9.15) 
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in  general.  However,  if  erosion  and  dilation  are  combined,  then — 
again  in  analogy  with  arithmetic  subtraction  and  addition — the  fol¬ 
lowing  chain  rule  holds: 


(/i  0  I 2)  ©  ^3  —  I\  ©  (^2  ©  h)-  (9.16) 


Although  dilation  and  erosion  are  not  mutually  inverse  (in  gen¬ 
eral,  the  effects  of  dilation  cannot  be  undone  by  a  subsequent  ero¬ 
sion),  there  are  still  some  strong  formal  relations  between  these  two 
operations.  For  one,  dilation  and  erosion  are  dual  in  the  sense  that  a 
dilation  of  the  foreground  (/)  can  be  accomplished  by  an  erosion  of 
the  background  (/)  and  subsequent  inversion  of  the  result, 


/©FT  =  (/©FT*), 


(9.17) 


where  H *  denotes  the  reflection  of  H  (Eqn.  (9.6)).  This  works  simi¬ 
larly  the  other  way,  too,  namely 


/©#  =  (/©  H*),  (9.18) 


effectively  eroding  the  foreground  by  dilating  the  background  with 
the  mirrored  structuring  element,  as  illustrated  by  the  example  in 
Fig.  9.10  (see  [88,  pp.  521-524]  for  a  formal  proof). 
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I  ®H* 

Fig.  9.10 

Implementing  erosion  via  di¬ 
lation.  The  binary  erosion  of 
the  foreground  I  ©  H  (a)  can 
be  implemented  by  dilating  the 
inverted  (background)  image  I 
with  the  reflected  structuring 
element  H*  and  subsequently 
inverting  the  result  again  (b). 


Equation  (9.18)  is  interesting  because  it  shows  that  we  only  need 
to  implement  either  dilation  or  erosion  for  computing  both,  consider¬ 
ing  that  the  foreground-background  inversion  is  a  very  simple  task. 
Algorithm  9.1  gives  a  simple  algorithmic  description  of  dilation  and 
erosion  based  on  the  aforementioned  relationships. 


187 


9  Morphological 
Filters 

Alg.  9.1 

Binary  dilation  and  erosion. 

Procedure  Dilate()  imple¬ 
ments  the  binary  dilation  as 
suggested  by  Eqn.  (9.8).  The 
original  image  I  is  displaced 
to  each  foreground  coordinate 
of  H  and  then  copied  into  the 
resulting  image  I' .  The  hot 
spot  of  the  structuring  ele¬ 
ment  H  is  assumed  to  be  at 
coordinate  (0,0).  Procedure 
Erode()  implements  the  bi¬ 
nary  erosion  by  dilating  the 
inverted  image  I  with  the  re¬ 
flected  structuring  element  H* , 
as  described  by  Eqn.  (9.18). 


Fig.  9.11 

Typical  binary  structur¬ 
ing  elements  of  various 
sizes.  4-neighborhood  (a), 
8-neighborhood  (b), 
“small  disk”  (c). 
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1 

Dilate(7,  77 ) 

Input:  7,  a  binary  image  of  size  M  x  TV; 

77,  a  binary  structuring  element. 

Returns  the  dilated  image  I'  —  I  0  77. 

2 

Create  map  I' :  M  x  N  i-A  {0, 1}  > 

new  binary  image  I' 

3 

for  all  (p)  £  M  x  N  do 

4 

I' (p)  <-  0 

>  i'  <-  ( } 

5 

for  all  q  £  77  do 

6 

for  all  p  £  7  do 

7 

I’(p  +  q)  <-  1 

>/'<-/'  U  {( P+q )} 

8 

return  I' 

>  I'  =  I  ©  77 

9 

Erode(7, 77) 

Input:  7,  a  binary  image  of  size  M  x  TV; 

77,  a  binary  structuring  element. 

Returns  the  eroded  image  I'  =  I  ©  77. 

10 

7  lnvert(7) 

D>  7  4 —  ~ T 

11 

H*  Reflect(H) 

12 

If  <—  lnvert(Dilate(7,  77*))  t>  Ir 

=  ieH  =  (I®H*) 

13 

return  Ir 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

(a) 

(b) 

9.2.6  Designing  Morphological  Filters 

A  morphological  filter  is  unambiguously  specified  by  (a)  the  type  of 
operation  and  (b)  the  contents  of  the  structuring  element.  The  ap¬ 
propriate  size  and  shape  of  the  structuring  element  depends  upon  the 
application,  image  resolution,  etc.  In  practice,  structuring  elements 
of  quasi-circular  shape  are  frequently  used,  such  as  the  examples 
shown  in  Fig.  9.11. 

A  dilation  with  a  circular  (disk-shaped)  structuring  element  with 
radius  r  adds  a  layer  of  thickness  r  to  any  foreground  structure  in  the 
image.  Conversely,  an  erosion  with  that  structuring  element  peels  off 
layers  of  the  same  thickness.  Figure  9.13  shows  the  results  of  dilation 
and  erosion  with  disk-shaped  structuring  elements  of  different  diam¬ 
eters  applied  to  the  original  image  in  Fig.  9.12.  Dilation  and  erosion 
results  for  various  other  structuring  elements  are  shown  in  Fig.  9.14. 

Disk-shaped  structuring  elements  are  commonly  used  to  imple¬ 
ment  isotropic  filters,  morphological  operations  that  have  the  same 
effect  in  every  direction.  Unlike  linear  filters  (e.g.,  the  2D  Gaussian 
filter  in  Sec.  5.3.3),  it  is  generally  not  possible  to  compose  an  isotropic 
2D  structuring  element  77°  from  ID  structuring  elements  Hx  and  Hy 
since  the  dilation  Hx  0  Hy  always  results  in  a  rectangular  (i.e.,  non¬ 
isotropic)  structure.  A  remedy  for  approximating  large  disk-shaped 
filters  is  to  alternately  apply  smaller  disk-shaped  operators  of  differ- 


9.2  Basic 

Morphological 

Operations 


Fig.  9.12 

Original  binary  image  and  the 
section  used  in  the  following 
examples  (illustration  by  Al¬ 
brecht  Durer,  1515). 


Dilation  Erosion 


(c)  r  =  5.0 


Fig.  9.13 

Results  of  binary  dilation  and 
erosion  with  disk-shaped  struc¬ 
turing  elements.  The  radius  of 
the  disk  (r)  is  1.0  (a),  2.5  (b), 
and  5.0  (c). 


ent  shapes,  as  illustrated  in  Fig.  9.15.  The  resulting  filter  is  generally 
not  fully  isotropic  but  can  be  implemented  efficiently  as  a  sequence 
of  small  filters. 

9.2.7  Application  Example:  Outline 

A  typical  application  of  morphological  operations  is  to  extract  the 
boundary  pixels  of  the  foreground  structures.  The  process  is  very 
simple.  First,  we  apply  an  erosion  on  the  original  image  I  to  remove 
the  boundary  pixels  of  the  foreground, 
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h 


Dilation 


Erosion 


Fig.  9.14 

Examples  of  binary  dilation 
and  erosion  with  various  free¬ 
form  structuring  elements. 
The  structuring  elements  H 
are  shown  in  the  left  column 
(enlarged).  Notice  that  the 
dilation  expands  every  iso¬ 
lated  foreground  point  to  the 
shape  of  the  structuring  ele¬ 
ment,  analogous  to  the  impulse 
response  of  a  linear  filter.  Un¬ 
der  erosion,  only  those  ele¬ 
ments  where  the  structuring 
element  is  fully  contained  in 
the  original  image  survive. 


/'  =  /  e  Hn, 

where  Hn  is  a  structuring  element,  for  example,  for  a  4-  or  8- 
neighborhood  (Fig.  9.11)  as  the  structuring  element  Hn .  The  actual 
boundary  pixels  B  are  those  contained  in  the  original  image  but  not 
in  the  eroded  image,  that  is,  the  intersection  of  the  original  image  I 
and  the  inverted  result  or 

5u/nJ7  =  /n  (i  eHn) .  (9.19) 

Figure  9.17  shows  an  example  for  the  extraction  of  region  boundaries. 
Notice  that  using  the  4-neighborhood  as  the  structuring  element  Hn 
produces  “8-connected”  contours  and  vice  versa  [125,  p.  504]. 

The  process  of  boundary  extraction  is  illustrated  on  a  simple  ex¬ 
ample  in  Fig.  9.16.  As  can  be  observed  in  this  figure,  the  result  B 
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contains  exactly  those  pixels  that  are  different  in  the  original  image 
I  and  the  eroded  image  I'  =  I  ©  which  can  also  be  obtained  by 
an  exclusive-OR  (XOR)  operation  between  pairs  of  pixels;  that  is, 
boundary  extraction  from  a  binary  image  can  be  implemented  as 

B(p )  e-  I {p)  XOR  (/  ©  Hn){p),  for  all  p .  (9.20) 

Figure  9.17  shows  a  more  complex  example  for  isolating  the  boundary 
pixels  in  a  real  image. 


I  ©  Hn 
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B  =  I  n  I  ©  Hn 
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Fig.  9.15 

Composition  of  large  morpho¬ 
logical  filters  by  repeated  ap¬ 
plication  of  smaller  filters:  re¬ 
peated  application  of  the  struc¬ 
turing  element  H  A  (a)  and 
structuring  element  HB  (b); 
alternating  application  of  H B 
and  Ha  (c). 


Fig.  9.16 

Outline  example  using  a  4- 
neighborhood  structuring  ele¬ 
ment  Hn .  The  image  I  is  first 
eroded  (/  ©  Hn)  and  subse¬ 
quently  inverted  (/  ©  Hn). 

The  boundary  pixels  are  finally 
obtained  as  the  intersection 
/  fl  /  ©  Hn. 
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Fig.  9.17 

Extraction  of  boundary  pixels 
using  morphological  opera¬ 
tions.  The  4-neighborhood 
structuring  element  used  in 
(a)  produces  8-connected 
contours.  Conversely,  using 
the  8-neighborhood  as  the 
structuring  element  gives 
4-connected  contours  (b). 


9.3  Composite  Morphological  Operations 

Due  to  their  semiduality,  dilation  and  erosion  are  often  used  together 
in  composite  operations,  two  of  which  are  so  important  that  they  even 
carry  their  own  names  and  symbols:  “opening”  and  “closing”.  They 
are  probably  the  most  frequently  used  morphological  operations  in 
practice. 

9.3.1  Opening 

A  binary  opening  IoH  denotes  an  erosion  followed  by  a  dilation  with 
the  same  structuring  element  17, 

IoH  =  (lGH)@H.  (9.21) 

The  main  effect  of  an  opening  is  that  all  foreground  structures  that 
are  smaller  than  the  structuring  element  are  eliminated  in  the  first 
step  (erosion).  The  remaining  structures  are  smoothed  by  the  subse¬ 
quent  dilation  and  grown  back  to  approximately  their  original  size,  as 
demonstrated  by  the  examples  in  Fig.  9.18.  This  process  of  shrinking 
and  subsequent  growing  corresponds  to  the  idea  for  eliminating  small 
structures  that  we  had  initially  sketched  in  Sec.  9.1. 

9.3.2  Closing 

When  the  sequence  of  erosion  and  dilation  is  reversed,  the  resulting 
operation  is  called  a  closing  and  denoted  I  •  H, 
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I  •  H  =  (/  ®  H)  ©  H. 


(9.22) 


Opening 


Closing 


(a)  r  =  1.0 


(b)  r  =  2.5 


(c)  r  =  5.0 


9.3  Composite 

Morphological 

Operations 


Fig.  9.18 

Binary  opening  and  closing 
with  disk-shaped  structuring 
elements.  The  radius  r  of  the 
structuring  element  H  is  1.0 
(top),  2.5  (center),  or  5.0  (bot¬ 
tom). 


A  closing  removes  (closes)  holes  and  fissures  in  the  foreground  struc¬ 
tures  that  are  smaller  than  the  structuring  element  H.  Some  exam¬ 
ples  with  typical  disk-shaped  structuring  elements  are  shown  in  Fig. 
9.18. 

9.3.3  Properties  of  Opening  and  Closing 

Both  operations,  opening  as  well  as  closing,  are  idempotent ,  mean¬ 
ing  that  their  results  are  “final”  in  the  sense  that  any  subsequent 
application  of  the  same  operation  no  longer  changes  the  result,  that 
is, 


I  o  H  =  (/  o  H)  o  H  =  ((/  o  H)  o  H)  o  H  =  . . .  , 

I  •  H  =  (I  •  H)  •  H  =  ((/  •  H)  •  H)  •  H  =  . . . .  ^9'23') 

Also,  opening  and  closing  are  “duals”  in  the  sense  that  opening  the 
foreground  is  equivalent  to  closing  the  background  and  vice  versa, 
that  is, 


I  o  H  =  I  •  H  and  I  •  H  =  I  o  H. 


(9.24) 
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Thinning  is  a  common  morphological  technique  which  aims  at  shrink¬ 
ing  binary  structures  down  to  a  maximum  thickness  of  one  pixel 
without  splitting  them  into  multiple  parts.  This  is  accomplished  by 
iterative  “conditional”  erosion.  It  is  applied  to  a  local  neighborhood 
only  if  a  sufficiently  thick  structure  remains  and  the  operation  does 
not  cause  a  separation  to  occur.  This  requires  that,  depending  on 
the  local  image  structure,  a  decision  must  be  made  at  every  image 
position  whether  another  erosion  step  may  be  applied  or  not.  The 
operation  continues  until  no  more  changes  appear  in  the  resulting 
image.  It  follows  that,  compared  to  the  ordinary  (“homogeneous”) 
morphological  discussed  earlier,  thinning  is  computationally  expen¬ 
sive  in  general.  A  frequent  application  of  thinning  is  to  calculate  the 
“skeleton”  of  a  binary  region,  for  example,  for  structural  matching  of 
2D  shapes. 

Thinning  is  also  known  by  the  terms  center  line  detection  and 
medial  axis  transform.  Many  different  implementations  of  varied 
complexity  and  efficiency  exist  (see,  e.g.,  [2,7,68,108,201]).  In  the 
following,  we  describe  the  classic  algorithm  by  Zhang  and  Suen  [265] 
and  its  implementation  as  a  representative  example.3 

9.4.1  Thinning  Algorithm  by  Zhang  and  Suen 

The  input  to  this  algorithm  is  a  binary  image  /,  with  foreground 
pixels  carrying  the  value  1  and  background  pixels  with  value  0.  The 
algorithm  scans  the  image  and  at  each  position  (r,  v)  examines  a  3  x  3 
neighborhood  with  the  central  element  P  and  the  surrounding  values 
N  =  (TVq,  7Vl5 . . . ,  7V7),  as  illustrated  in  Fig.  9.5(b).  The  complete 
process  is  summarized  in  Alg.  9.2. 

For  classifying  the  contents  of  the  local  neighborhood  N  we  first 
define  the  function 
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7 

B(  N)  =  iV0  +  N!  +  •  •  •  +  N7  =  Yi  Nt,  (9.25) 

i= 0 

which  simply  counts  surrounding  foreground  pixels.  We  also  define 
the  so-called  “connectivity  number”  to  express  how  many  binary  com¬ 
ponents  are  connected  via  the  current  center  pixel  at  position  (u,v). 
This  quantity  is  equivalent  to  the  number  of  1  —>  0  transitions  in  the 
sequence  (7V0, . . . ,  7V7,  7V0),  or  expressed  in  arithmetic  terms, 


7 

C(N)  =  Y,Ni-[Ni 

2  =  0 


N 


(i+l)  mod  8j  • 


(9.26) 


Figure  9.19  shows  some  selected  examples  for  the  neighborhood  N  and 
the  associated  values  for  the  functions  F>(N)  and  C(N).  Based  on  the 
above  functions,  we  finally  define  two  Boolean  predicates  R1:R2  on 
the  neighborhood  N, 

3  The  built-in  thinning  operation  in  ImageJ  is  also  based  on  this 
algorithm. 


i?i(N) 

R2(  N) 


2  <  B( N)  <  6]  A  [C(N)  =  i;  A 

Ag  '  A0  ’  ^2  =  0  ]  A  [  A^4  '  Nq  '  Nq  =  0 
2  <  B(N)  <  6]  A  [C(N)  =  r  A 
Nq  '  A^2  '  A^  =  0  ]  A  [  N 2  '  A^  •  N6  =  0 


(9.27) 

(9.28) 


□  □  □ 

■  □  ■ 

m  □  ■ 

■  □  ■ 

□  ■  □ 

■  ■  ■ 

■  ■  ■ 

□  ■  □ 

□  □  □ 

■  ■  ■ 

■  □  ■ 

□  ■  □ 

B  =  0 

B  =  7 

B  =  6 

B  =  3 

c  =  o 

C  =  1 

C  =  2 

C  =  3 

■  □  ■ 

□  background  (0) 

□  ■  □ 

■  foreground  (1) 

■  □  ■ 

1  center  pixel  (1) 

B  =  4 

C  =  4 

Depending  on  the  outcome  of  i^i(N)  and  i^2(N),  the  foreground 
pixel  at  the  center  position  of  N  is  either  deleted  (i.e.,  eroded)  or 
marked  as  non-removable  (see  Alg.  9.2,  lines  16  and  27). 

Figure  9.20  illustrates  the  effect  of  layer-by-layer  thinning  per¬ 
formed  by  procedure  ThinOnce().  In  every  iteration,  only  one  “layer” 
of  foreground  pixels  is  selectively  deleted.  An  example  of  thinning 
applied  to  a  larger  binary  image  is  shown  in  Fig.  9.21. 


(a)  Original  (b)  1359  deletions  (c)  881  deletions 


9.4.2  Fast  Thinning  Algorithm 

In  a  binary  image,  only  28  =  256  different  combinations  of  zeros  and 
ones  are  possible  inside  any  8-neighborhood.  Since  the  expressions 
in  Eqns.  (9.27)-(9.27)  are  relatively  costly  to  evaluate  it  makes  sense 
to  pre-calculate  and  tabulate  all  256  instances  (see  Fig.  9.22).  This 
is  the  basis  of  the  fast  version  of  Zhang  and  Suen’s  algorithm,  sum¬ 
marized  in  Alg.  9.3.  It  uses  a  decision  table  Q,  which  is  constant  and 
calculated  only  once  by  procedure  MakeDeletionCodeTableQ  in  Alg. 
9.3  (lines  34-45).  The  table  contains  the  binary  codes 

Q (*)  e  {0, 1, 2, 3}  =  {00b,  01b,  10b,  llb>,  (9.29) 

for  i  =  0, . . . ,  255,  where  the  two  bits  correspond  to  the  predicates 
Ri  and  R2 ,  respectively.  The  associated  test  is  found  in  procedure 
ThinOnceFast()  in  line  19.  The  two  passes  are  in  this  case  controlled 
by  a  separate  loop  variable  (p  =  1,2).  In  the  concrete  implemen¬ 
tation,  the  map  Q  is  not  calculated  at  the  start  but  defined  as  a 
constant  array  (see  Prog.  9.1  for  the  actual  Java  code). 


9.4  Thinning 
(Skeletonization) 


Fig.  9.19 

Selected  binary  neighborhood 
patterns  N  and  associated 
function  values  B(N)  and  C(N) 
(see  Eqns.  (9.25)— (9.26)). 


Fig.  9.20 

Iterative  application  of  the 
ThinOnce()  procedure.  The 
“deletions”  indicated  in  (b— f) 
denote  the  number  of  pixels 
that  were  removed  from  the 
previous  image.  No  deletions 
occurred  in  the  final  iteration 
(from  (e)  to  (f)).  Thus  five 
iterations  were  required  to  thin 
this  image. 
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Alg.  9.2 

Iterative  thinning  algorithm 
by  Zhang  und  Suen  [265].  Pro¬ 
cedure  ThinOnce()  performs 
a  single  thinning  step  on  the 
supplied  binary  image  Ih  and 
returns  the  number  of  deleted 
foreground  pixels.  It  is  itera¬ 
tively  invoked  by  Thin()  until 
no  more  pixels  are  deleted. 
The  required  pixel  deletions 
are  only  registered  in  the  bi¬ 
nary  map  D  and  executed 
en-bloc  at  the  end  of  every 
iteration.  Lines  40—42  define 
the  functions  -RR),  -R2(b 
and  C()  used  to  characterize 
the  local  pixel  neighborhoods. 
Note  that  the  order  of  process¬ 
ing  the  image  positions  (u,  v ) 
in  the  for  all  loops  in  Pass 
1  and  Pass  2  is  completely 
arbitrary.  In  particular,  posi¬ 
tions  could  be  processed  simul¬ 
taneously,  so  the  algorithm 
may  be  easily  parallelized 
(and  thereby  accelerated). 


1:  Thin  (7b 

?  Vax) 

Input:  7b,  binary  image  with  background  =  0,  foreground  >  0; 
imax  5  max.  number  of  iterations.  Returns  the  number  of  iterations 
performed  and  modifies  7b. 

2:  (M,  N)  X—  Size(/b) 

3:  Create  a  binary  map  D:  M  x  iV4  {0, 1} 

4:  i  x—  0 

5:  do 

6:  nd  X—  ThinOnce(7b,  D) 

7:  i  i —  i  - 1-1 

8:  while  (nd  >  0  A  i  <  imax)  >  do  . . .  while  more  deletions  required 

9:  return  i 


10:  ThinOnce(7b,  D) 

Pass  1: 

11:  n1  X—  0  >  deletion  counter 

12:  for  all  image  positions  (u,v)  £  M  x  N  do 

13:  D(r,  v)  X—  0 

14:  if  Ih(u,v)  >  0  then 

15:  N  x—  GetNeighborhood(7b,  u,  v) 

16:  if  Ri(N)  then  >  see  Eq.  9.27 

17:  D  (u,  v)  X—  1  >  mark  pixel  (u,  v )  for  deletion 

18:  n1  X—  rii  +  1 

19:  if  n1  >  0  then  >  at  least  1  deletion  required 

20:  for  all  image  positions  (u,v)  £  MxN  do 

21:  4  (u,v)  X—  Ih(u,v)  —  D(r,  v)  d>  delete  all  marked  pixels 

Pass  2: 

22:  n2  X—  0 

23:  for  all  image  positions  (u,v)  £  M  x  N  do 

24:  D(r,  v)  X—  0 

25:  if  Ih(u,v)  >  0  then 

26:  N  x—  GetNeighborhood(7b,  u,  v) 

27:  if  R2 (N)  then  >  see  Eq.  9.28 

28:  D(r,t)  X—  1  >  mark  pixel  (u,v)  for  deletion 

29:  77-2  X —  TL2  H-  1 

30:  if  n2  >  0  then  D>  at  least  1  deletion  required 

31:  for  all  image  positions  (u,v)  £  M  x  N  do 

32:  Ih(u,v)  X—  Ih(u,v)  —  D(iq  v)  >  delete  all  marked  pixels 

33:  return  n1  +  n2 


34 

GetNeighborhood(7b,  u, 

v) 

35 

No  x—  4(r  +  1, n), 

Vi 

x— 

Ib(u  + 

1, 

V 

— 

i) 

36 

N2  X—  7b(iq  v  —  1), 

n3 

x— 

Ih(u  - 

1, 

V 

— 

1) 

37 

N4  x—  /b(h,  —  1,  v)j 

N5 

x— 

Ih(u  - 

1, 

V 

+ 

i) 

38 

Nq  x—  4 (ui v  +  1), 

n7 

x— 

Ib(u  + 

1, 

V 

+ 

1) 

39 

return  (N0l  N1, ... , 

Nr) 

40:  i?i(N):=  [2  <  R(N)  < 6]  A[C(N)  =  1]  A[N6 -N0 -N2  =  0]  A[7V4 -N6 -N0  =  0] 
41:  i?2(N):=  [2  <  B(N)  <  6]  A[C(N)  =  1]  A[N0  ■  N2 -N4  =  0]  A[N2 -N4 -N6  =  0] 


42:  B( N)  :=  £  Nt,  C( N)  :=  £  Nt  ■  [Nt 


1)  mod  8 


i= 0 


i= 0 
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1:  ThinFast(/b,*max) 

Input:  7b,  binary  image  with  background  =  0,  foreground  >  0; 
imax,  max.  number  of  iterations.  Returns  the  number  of  iterations 
performed  and  modifies  7b. 

2:  (M,  N)  A-  Size(/b) 

3:  Q  x—  MakeDeletionCodeTable() 

4:  Create  a  binary  map  D:MxiV4  {0, 1} 

5:  i  <-  0 

6:  do 

7:  nd  X—  Thin0nce(7b,  D) 

8:  while  (nd  >  0  Ai  <  imax)  >  do  . . .  while  more  deletions  required 

9:  return  i 


10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26: 


ThinOnceFast(7b,  D)  t>  performs  a  single  thinning  iteration 

nd  X—  0  >  number  of  deletions  in  both  passes 

for  p  X—  1,  2  do  >  pass  counter  (2  passes) 

nf-  0  >  number  of  deletions  in  current  pass 

for  all  image  positions  ( u ,  v)  do 

D (u,  v)  x—  0 

if  Ih(u,v)  =  1  then  >  Ih(u,v)  =  P 

c  x—  GetNeighborhoodlndex(7b,  u,  v) 
q  X—  Q(c)  >  q  £  {0,  1,  2,  3}  =  {00b,  01b,  10b,  1  lb } 
if  (p  and  q)  ^  0  then  >  bitwise  ‘and’  operation 
D(iqn)  X—  1  >  mark  pixel  (u,v)  for  deletion 

n  X—  n  +  1 

if  n  >  0  then  >  at  least  1  deletion  is  required 

Tld  X —  Tld  H-  Tl 

for  all  image  positions  (u,  v )  do 

7b(iq  v)  X—  7b(iq  v )  —  D (u,  v )  >  delete  all  marked 

pixels 

return  nd 


27:  GetNeighborhoodlndex(7b ,  it,  v) 

28:  N0  x—  7b(it  +  1,  n),  x—  Id(u  +  1,  n  —  1) 

29:  fV2  X—  7b(iq  n  —  1),  AT3  x—  7b(it  —  1,  n  —  1) 

30:  7V4  *<—  7b(it  —  1,  n),  Af5  x—  7b(it  —  1,  v  +  1) 

31:  Nq  i —  7b(iq  n  +  1),  Ny  < —  7b(zx  T  15  t  +  1) 

32:  c<-Nq  +  Nv  2  +  lV2-4  +  lV3-8  +  AT4-16  +  7V5-32  +  AT6-64  +  lVr128 

33:  return  c  >  c  G  [0, 255] 


34: 

35: 

36: 

37: 

38: 

39: 

40: 

41: 

42: 

43: 

44: 

45: 


MakeDeletionCodeTable() 

Create  maps  Q  :  [0,  255]  i— >  {0, 1,  2,  3},  N  :  [0,  7]  i— >  {0, 1} 
for  i  <—  0, . . . ,  255  do  t>  list  all  possible  neighborhoods 

for  k  <—  0, . . . ,  7  do  D>  check  neighbors  0, . . . ,  7 


N(*)«-  jj 

q  <—  0 

if  Ri(N)  then 
q  <-  q  +  1 
if  7£2(N)  then 
q  <r-  q  +  2 


if  (z  and  2fc)  0 
otherwise 


>  test  the  kth  bit  of  z 


>  see  Alg.  9.2,  line  40 

>  set  bit  0  of  q 

>  see  Alg.  9.2,  line  41 

>  set  bit  1  of  q 


Q(z)  <-  q 


return  Q 


>  g  G  {0, 1,  2,  3}  —  {00b,  01b,  10b,  llb} 


9.4  Thinning 
(Skeletonization) 


Alg.  9.3 

Thinning  algorithm  by  Zhang 
und  Suen  (accelerated  version 
of  Alg.  9.2).  This  algorithm 
employs  a  pre-calculated  ta¬ 
ble  of  “deletion  codes”  (Q). 
Procedure  GetNeighborhood() 
has  been  replaced  by 
GetNeighborhoodlndex(),  which 
does  not  return  the  neighbor¬ 
ing  pixel  values  themselves 
but  the  associated  8-bit  in¬ 
dex  c  with  possible  values  in 
0,  .  .  .  ,  255  (see  Fig.  9.22).  For 
completeness,  the  calculation 
of  table  Q  is  included  in  proce¬ 
dure  MakeDeletionCodeTable(), 
although  this  table  is  fixed  and 
may  be  simply  defined  as  a 
constant  array  (see  Prog.  9.1). 
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Fig.  9.21 

Thinning  a  binary  image  (Alg. 
9.2  or  9.3).  Original  image 
with  enlarged  detail  (a,  c) 
and  results  after  thinning  (b, 
d).  The  original  foreground 
pixels  are  marked  green,  the 
resulting  pixels  are  black. 


Prog.  9.1 

Java  definition  for  the 
“deletion  code”  ta¬ 
ble  Q  (see  Fig.  9.22). 


1 

static 

final 

byt 

e  [] 

Q 

=  { 

2 

0, 

0, 

0, 

3, 

0, 

0, 

3, 

3, 

0, 

0, 

0, 

0, 

3, 

0, 

3, 

3, 

3 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

3, 

0, 

0, 

0, 

3, 

0, 

3, 

1, 

4 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

5 

3, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

3, 

0, 

0, 

0, 

3, 

0, 

3, 

1, 

6 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

T 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

8 

3, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

9 

3, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

3, 

0, 

0, 

0, 

1, 

0, 

1, 

0, 

10 

0, 

3, 

0, 

3, 

0, 

0, 

0, 

3, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

3, 

11 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

1, 

12 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

13 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

14 

3, 

3, 

0, 

3, 

0, 

0, 

0, 

2, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

2, 

15 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

16 

3, 

3, 

0, 

3, 

0, 

0, 

0, 

2, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

0, 

IT 

3, 

2, 

0, 

2, 

0, 

0, 

0, 

0, 

3, 

2, 

0, 

0, 

1, 

0, 

0, 

0 

18 

}; 

9.4.3  Java  Implementation 

The  complete  Java  source  code  for  the  morphological  operations  on 
binary  images  is  available  online  as  part  of  the  imagingbook4  library. 


Package  imagingbook . pub . morphology. 
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□□□ 

□  □□ 

■  ■■ 

■  ■■ 

□  ■ 

O  A  r  1  hTTATATTATn 

□□□ 

□  □□ 

□  □□ 

□  □□ 

□  □□ 

□  □□ 

□  □□ 

□  □□ 

□  □□ 

□  □□ 

□  □□ 

□  □□ 

□  □□ 

□  □□ 

□  □□ 

□  □□ 

imiNiNliNCr 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

(Skeletonization) 

□□□ 

□  □□ 

■  ■■ 

■  ■■ 

■  ■■ 

■  ■■ 

■  ■■ 

■  ■■ 

■  □ 

■  ■■ 

■  ■■ 

■  ■■ 

■  □ 

■  ■■ 

□□□ 

□  □□ 

□  □□ 

□  □□ 

□  □□ 

□  □□ 

□  □□ 

□  □□ 

□  □□ 

□  □□ 

□  □□ 

□  □□ 

□  □□ 

□  □□ 

□  □□ 

□  □□ 

Fig.  9.22 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

“Deletion  codes”  for  the 

□  □□ 

□  □□ 

■  ■■ 

■  ■■ 

256  possible  binary  8- 

neighborhoods  tabulated  m 

map  Q(c)  of  Alg.  9.3.  □  =  0 

32 

33 

34 

35 

36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 

47 

and  ■  =  1  denote  background 

□  □□ 

□  □□ 

■  ■■ 

■  ■■ 

and  foreground  pixels,  respec- 

■  □ 

■  ■■ 

■  ■■ 

■  ■■ 

■  ■■ 

■  □ 

■  ■■ 

■  ■■ 

■  ■■ 

■  □ 

■  ■■ 

tively.  The  2-bit  codes  are 

color  coded  as  indicated  at  the 

48 

49 

50 

51 

52 

53 

54 

55 

56 

57 

58 

59 

60 

61 

62 

bottom 

□  □□ 

□  □□ 

■  ■■ 

■  ■■ 

64 

65 

66 

67 

68 

69 

70 

71 

72 

73 

74 

75 

76 

77 

78 

79 

□  □□ 

□  □□ 

■  ■■ 

■  ■■ 

■  ■■ 

■  ■■ 

■  ■□ 

■  ■■ 

■  ■■ 

■  ■■ 

■  ■■ 

■  ■■ 

■  ■■ 

80 

81 

82 

83 

84 

85 

86 

87 

88 

89 

90 

91 

92 

93 

94 

95 

□  □□ 

□  □□ 

■  ■■ 

■  ■■ 

96 

97 

98 

99 

100 

101 

102 

103 

104 

105 

106 

107 

108 

109 

110 

111 

□  □□ 

□  □□ 

■  ■■ 

■  ■■ 

■  ■■ 

■  ■■ 

■  ■■ 

■  ■■ 

■  □ 

■  ■■ 

■  ■■ 

■  ■■ 

■  ■■ 

112 

113 

114 

115 

116 

117 

118 

119 

120 

121 

122 

123 

124 

125 

126 

127 

□  □□ 

□  □□ 

■  ■■ 

■  ■■ 

□  ■ 

□  ■ 

128 

129 

130 

131 

132 

133 

134 

135 

136 

137 

138 

139 

140 

141 

142 

143 

□  □□ 

□  □□ 

■  ■■ 

■  ■■ 

■  ID 

■  ■■ 

■  ■■ 

■  ■■ 

■  ■■ 

■  ■■ 

■  ■■ 

■  ■■ 

■  ■■ 

144 

145 

146 

147 

148 

149 

150 

151 

152 

153 

154 

155 

156 

157 

158 

159 

□  □□ 

□  □□ 

■  ■■ 

■  ■■ 

160 

161 

162 

163 

164 

165 

166 

167 

168 

169 

170 

171 

172 

173 

174 

175 

□  □□ 

□  □□ 

■  ■■ 

■  ■■ 

■  ■■ 

■  ■■ 

■  ■■ 

■  ■■ 

■  ■■ 

■  ■■ 

■  ■■ 

■  ■■ 

176 

177 

178 

179 

180 

181 

182 

183 

184 

185 

186 

187 

188 

189 

190 

191 

□  □□ 

□  □□ 

■  ■■ 

■  ■■ 

□  ■ 

192 

193 

194 

195 

196 

197 

198 

199 

200 

201 

202 

203 

204 

205 

206 

207 

□  □□ 

□  □□ 

■  ■■ 

■  ■■ 

■  ■■ 

■  ■■ 

■  ■■ 

■  ■■ 

■  ■■ 

■  ■■ 

■  ■■ 

■  ■■ 

208 

209 

210 

211 

212 

213 

214 

215 

216 

217 

218 

219 

220 

221 

222 

223 

□  □□ 

□  □□ 

■  ■■ 

■  ■■ 

□  ■ 

224 

225 

226 

227 

228 

229 

230 

231 

232 

233 

234 

235 

236 

237 

238 

239 

□  □□ 

□  □□ 

■  ■■ 

■  ■■ 

■  ■■ 

■  ■■ 

■  ■□ 

■  ■■ 

■  ■■ 

■  □ 

■  ■■ 

■  ■■ 

■  ■■ 

■  ■■ 

240 

241 

242 

243 

244 

245 

246 

247 

248 

249 

250 

251 

252 

253 

254 

255 

Codes  Q (c)  for  c  =  0,  .  .  .  ,  255: 


I  0  =  00b  (never  deleted)  ■  2  =  10b  (deleted  only  in  Pass  2) 

■  1  =  01b  (deleted  only  in  Pass  1)  3  =  llb  (deleted  in  Pass  1  and  2) 


BinaryMorphologyFilter  class 

This  class  implements  several  morphological  operators  for  binary  im¬ 
ages  of  type  ByteProcessor.  It  defines  the  sub-classes  Box  and  Disk 
with  different  structuring  elements.  The  class  provides  the  following 
constructors: 


9  Morphological 
Filters 


BinaryMorphologyFilter  () 

Creates  a  morphological  filter  with  a  (default)  structuring  el¬ 
ement  of  size  3  x  3  as  depicted  in  Fig.  9.11(b). 

BinaryMorphologyFilter  (int[]  []  H) 

Creates  a  morphological  filter  with  a  structuring  element  spec¬ 
ified  by  the  2D  array  H,  which  may  contain  0/1  values  only  (all 
values  >  0  are  treated  as  1). 

BinaryMorphologyFilter . Box  (int  rad) 

Creates  a  morphological  filter  with  a  square  structuring  ele¬ 
ment  of  radius  rad  >  1  and  side  length  2  •  rad  +  1  pixels. 

BinaryMorphologyFilter .  Disk  (double  rad) 

Creates  a  morphological  filter  with  a  disk-shaped  structuring 
element  with  radius  rad  >  1  and  diameter  2  •  round(rad)  +  1 
pixels. 

The  key  methods5  of  BinaryMorphologyFilter  are: 

void  applyTo  (ByteProcessor  I,  OpType  op) 

Destructively  applies  the  morphological  operator  op  to  the  im¬ 
age  I.  Possible  arguments  for  op  are  Dilate,  Erode,  Open, 
Close,  Outline,  Thin. 

void  dilate  (ByteProcessor  I) 

Performs  (destructive)  dilation  on  the  binary  image  I  with  the 
initial  structuring  element  of  this  filter. 

void  erode  (ByteProcessor  I) 

Performs  (destructive)  erosion  on  the  binary  image  I. 

void  open  (ByteProcessor  I) 

Performs  (destructive)  opening  on  the  binary  image  I. 

void  close  (ByteProcessor  I) 

Performs  (destructive)  closing  on  the  binary  image  I. 

void  outline  (ByteProcessor  I) 

Performs  a  (destructive)  outline  operation  on  the  binary  image 
I  using  a  3  x  3  structuring  element  (see  Sec.  9.2.7). 

void  thin  (ByteProcessor  I) 

Performs  a  (destructive)  thinning  operation  on  the  binary 
image  I  using  a  3  x  3  structuring  element  (with  at  most 
Vax  =  1300  iterations,  see  Alg.  9.3). 

void  thin  (ByteProcessor  I,  int  iMax) 

Performs  a  thinning  operation  with  at  most  iMax  iterations 
(see  Alg.  9.3). 

int  thinOnce  (ByteProcessor  I) 

Performs  a  single  iteration  of  the  thinning  operation  and  re¬ 
turns  the  number  of  pixel  deletions  (see  Alg.  9.3). 

The  methods  listed  here  always  treat  image  pixels  with  value  0  as 
background  and  values  >  0  as  foreground.  Unlike  ImageJ’s  built-in 
implementation  of  morphological  operations  (described  in  Sec.  9.4.4), 
the  display  lookup  table  (LUT,  typically  only  used  for  display  pur¬ 
poses)  of  the  image  is  not  taken  into  account  at  all. 
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See  the  online  documentation  for  additional  methods. 


1  import  i j . ImagePlus ; 

2  import  ij .plugin. filter . PluglnFilter ; 

3  import  ij .process . ByteProcessor ; 

4  import  ij .process . ImageProcessor ; 

5  import  imagingbook .pub .morphology . BinaryMorphologyFilter ; 

6  import  imagingbook . pub . morphology . BinaryMorphologyFilter . 

OpType ; 

7 

8  public  class  Bin_Dilate_Disk_Demo  implements  PluglnFilter  { 

9  static  double  radius  =  5.0; 

10  static  OpType  op  =  OpType  .Dilate  ;  // Erode,  Open,  Close,  ... 

11 

12  public  int  setup (String  arg,  ImagePlus  imp)  { 

13  return  D0ES_8G; 

14  } 

15 

16  public  void  run (ImageProcessor  ip)  { 

17  BinaryMorphologyFilter  bmf  = 

18  new  BinaryMorphologyFilter . Disk (radius ) ; 

19  bmf . applyTo ( (ByteProcessor)  ip,  op); 

20  } 

21  } 


9.4  Thinning 
(Skeletonization) 

Prog.  9.2 

Example  for  using  class 
BinaryMorphologyFilter  (see 
Sec.  9.4.3)  inside  a  IrnageJ 
plugin.  The  actual  filter  op¬ 
erator  is  instantiated  in  line 
18  and  subsequently  (in  line 
19)  applied  to  the  image  ip 
of  type  ByteProcessor.  Avail¬ 
able  operations  (OpType)  are 
Dilate,  Erode,  Open,  Close, 
Outline  and  Thin.  Note  that 
the  results  depend  strictly  on 
the  pixel  values  of  the  input 
image,  with  values  0  taken  as 
background  and  values  >  0 
taken  as  foreground.  The  dis¬ 
play  lookup-table  (LUT)  is 
irrelevant. 


The  example  in  Prog.  9.2  shows  the  use  of  class  BinaryMorpho¬ 
logyFilter  in  a  complete  Image  J  plugin  that  performs  dilation  with 
a  disk-shaped  structuring  element  of  radius  5  (pixel  units).  Other 
examples  can  be  found  in  the  online  code  repository. 


9.4.4  Built-in  Morphological  Operations  in  ImageJ 

Apart  from  the  implementation  described  in  the  previous  section, 
the  ImageJ  API  provides  built-in  methods  for  basic  morphological 
operations,  such  as  dilate  ()  and  erode  () .  These  methods  use  a  3x3 
structuring  element  (analogous  to  Fig.  9.11(b))  and  are  only  defined 
for  images  of  type  ByteProcessor  and  ColorProcessor.  In  the  case 
of  RGB  color  images  (ColorProcessor)  the  morphological  operation 
is  applied  individually  to  the  three  color  channels.  All  these  and 
other  morphological  operations  can  be  applied  interactively  through 
ImageJ’s  Process  >  Binary  menu  (see  Fig.  9.23(a)). 

Note  that  ImageJ’s  dilate ()  and  erode ()  methods  use  the  cur¬ 
rent  settings  of  display  lookup  table  (LUT)  to  discriminate  between 
background  and  foreground  pixels.  Thus  the  results  of  morphological 
operations  depend  not  only  on  the  stored  pixel  values  but  how  they 
are  being  displayed  (in  addition  to  the  settings  in  Process  >  Binary 
>  Options...,  see  Fig.  9.23(b)).6  It  is  therefore  recommended  to  use 
the  methods  (defined  for  ByteProcessor  only) 

dilate (int  count,  int  background), 
erode (int  count,  int  background) 

6  These  dependencies  may  be  quite  confusing  because  the  same  program 
will  produce  different  results  under  different  user  setups. 
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Fig.  9.23 

Morphological  operations  in 
ImageJ’s  built-in  standard 
menu  Process  >  Binary  (a)  and 
optional  settings  with  Process 
>  Binary  >  Options...  (b).  The 
choice  “Black  background” 
specifies  if  background  pixels 
are  bright  or  dark,  which  is 
taken  into  account  by  ImageJ’s 
morphological  operations. 
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instead,  since  they  provide  explicit  control  of  the  background  pixel 
value  and  are  thus  independent  from  other  settings.  ImageJ’s  Byte- 
Processor  class  defines  additional  methods  for  morphological  opera¬ 
tions  on  binary  images,  such  as  outline ()  and  skeletonize  () .  The 
method  outline  ()  implements  the  extraction  of  region  boundaries 
using  an  8-neighborhood  structuring  element,  as  described  in  Sec. 
9.2.7.  The  method  skeletonize () ,  on  the  other  hand,  implements 
a  thinning  process  similar  to  Alg.  9.3. 


9.5  Grayscale  Morphology 

Morphological  operations  are  not  confined  to  binary  images  but 
are  also  for  intensity  (grayscale)  images.  In  fact,  the  definition  of 
grayscale  morphology  is  a  generalization  of  binary  morphology,  with 
the  binary  OR  and  AND  operators  replaced  by  the  arithmetic  MAX 
and  MIN  operators,  respectively.  As  a  consequence,  procedures  de¬ 
signed  for  grayscale  morphology  can  also  perform  binary  morphology 
(but  not  the  other  way  around).7  In  the  case  of  color  images,  the 
grayscale  operations  are  usually  applied  individually  to  each  color 
channel. 

9.5.1  Structuring  Elements 

Unlike  in  the  binary  scheme,  the  structuring  elements  for  grayscale 
morphology  are  not  defined  as  point  sets  but  as  real- valued  2D  func¬ 
tions,  that  is, 

H(i,j)  E  R,  for  (i,  j)  E  Z2.  (9.30) 

The  values  in  H  may  be  negative  or  zero.  Notice,  however,  that,  in 
contrast  to  linear  convolution  (Sec.  5.3.1),  zero  elements  in  grayscale 

7  ImageJ  provides  a  single  implementation  of  morphological  operations 
that  handles  both  binary  and  grayscale  images  (see  Sec.  9.4.4). 
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morphology  generally  do  contribute  to  the  result.8  The  design  of 
structuring  elements  for  grayscale  morphology  must  therefore  dis¬ 
tinguish  explicitly  between  cells  containing  the  value  0  and  empty 
(“don’t  care”)  cells,  for  example, 


0 

t-H 

0 

T— 1 

r- H 

to 

T— 1 

4k 

i— i 

to 

1 

0 

1 

0 

1 

(9.31) 


9.5.2  Dilation  and  Erosion 

The  result  of  grayscale  dilation  I  ®  H  is  defined  as  the  maximum  of 
the  values  in  H  added  to  the  values  of  the  current  subimage  of  /, 
that  is, 


(/  ®  H)  (it,  v)  =  max  (I{u+i,v+j)  +  .  (9.32) 

Similarly,  the  result  of  grayscale  erosion  is  the  minimum  of  the  dif¬ 
ferences, 

(. lQH)(u,v)=  min  (I(u+i,v+j)  —  H(i,j)).  (9.33) 

(nj)eH 

Figures  9.24  and  9.25  demonstrate  the  basic  process  of  grayscale  di¬ 
lation  and  erosion,  respectively,  on  a  simple  example. 
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In  general,  either  operation  may  produce  negative  results  that 
must  be  considered  if  the  range  of  pixel  values  is  restricted,  for  ex¬ 
ample,  by  clamping  the  results  (see  Ch.  4,  Sec.  4.1.2).  Some  examples 
of  grayscale  dilation  and  erosion  on  natural  images  using  disk-shaped 
structuring  elements  of  various  sizes  are  shown  in  Fig.  9.26.  Figure 
9.28  demonstrates  the  same  operations  with  some  freely  designed 
structuring  elements. 

9.5.3  Grayscale  Opening  and  Closing 

Opening  and  closing  on  grayscale  images  are  defined,  identical  to 
the  binary  case  (Eqns.  (9.21)  and  (9.22)),  as  operations  composed 

8  While  a  zero  coefficient  in  a  linear  convolution  matrix  simply  means 
that  the  corresponding  image  pixel  is  ignored. 
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Fig.  9.24 

Grayscale  dilation  /  ®  H . 

The  3  X  3  pixel  structuring  ele¬ 
ment  H  is  placed  on  the  image 
I  in  the  upper  left  position. 
Each  value  of  H  is  added  to 
the  corresponding  element  of  /; 
the  intermediate  result  (/  +  H ) 
for  this  particular  position  is 
shown  below.  Its  maximum 
value  8  =  7  +  1  is  inserted 
into  the  result  (/  ®  H )  at  the 
current  position  of  the  filter 
origin.  The  results  for  three 
other  filter  positions  are  also 
shown. 
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Fig.  9.25 

Grayscale  erosion  IQH. 
The  3  X  3  pixel  structuring 
element  H  is  placed  on  the 
image  I  in  the  upper  left  posi¬ 
tion.  Each  value  of  H  is  sub¬ 
tracted  from  the  corresponding 
element  of  /;  the  intermedi¬ 
ate  result  (/  —  H)  for  this 
particular  position  is  shown 
below.  Its  minimum  value 
3  —  1  =  2  is  inserted  into  the 
result  (/  ©  H )  at  the  current 
position  of  the  filter  origin. 
The  results  for  three  other 
filter  positions  are  also  shown. 


Fig.  9.26 

Grayscale  dilation  and  erosion 
with  disk-shaped  structur¬ 
ing  elements.  The  radius  r 
of  the  structuring  element  is 
2.5  (a),  5.0  (b),  and  10.0  (c). 
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9.6  Exercises 


(a)  r  =  2.5 


(b)  r  =  5.0 


(c)  r  =  10.0 


Fig.  9.27 

Grayscale  opening  and  closing 
with  disk-shaped  structuring 
elements.  The  radius  r  of  the 
structuring  element  is  2.5  (a), 
5.0  (b),  and  10.0  (c). 


of  dilation  and  erosion  with  the  same  structuring  element.  Some 
examples  are  shown  in  Fig.  9.27  for  disk-shaped  structuring  elements 
and  in  Fig.  9.29  for  various  nonstandard  structuring  elements.  Notice 
that  interesting  effects  can  be  obtained,  particularly  from  structuring 
elements  resembling  the  shape  of  brush  or  other  stroke  patterns. 

As  mentioned  in  Sec.  9.4.4,  the  morphological  operations  ava- 
iable  in  ImageJ  can  be  applied  to  binary  images  as  well  as  grayscale 
images.  In  addition,  several  additional  plugins  and  complete  mor¬ 
phological  packages  are  available  online,9  including  the  morphology 
operators  by  Gabriel  Landini  and  the  Grayscale  Morphology  package 
by  Dimiter  Prodanov,  which  allows  structuring  elements  to  be  inter¬ 
actively  specified  (a  modified  version  was  used  for  some  examples  in 
this  chapter). 


9.6  Exercises 

Exercise  9.1.  Manually  calculate  the  results  of  dilation  and  erosion 
for  the  following  image  I  and  the  structuring  elements  Hi  and  H2 : 

See  http://rsb.info.nih.gov/ij/plugins/. 
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Fig.  9.28 

Grayscale  dilation  and 
erosion  with  various  free¬ 
form  structuring  elements. 


I 


Exercise  9.2.  Assume  that  a  binary  image  I  contains  unwanted  fore¬ 
ground  spots  with  a  maximum  diameter  of  5  pixels  that  should  be 
removed  without  damaging  the  remaining  structures.  Design  a  suit¬ 
able  morphological  procedure,  and  evaluate  its  performance  on  ap¬ 
propriate  test  images. 

Exercise  9.3.  Investigate  if  the  results  of  the  thinning  operation  de¬ 
scribed  in  Alg.  9.2  (and  implemented  by  the  thin()  method  of  class 
BinaryMorphologyFilter)  are  invariant  against  rotating  the  image 
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9.6  Exercises 


Fig.  9.29 

Grayscale  opening  and  closing 
with  various  free-form  struc¬ 
turing  elements. 


by  90°  and  horizontal  or  vertical  mirroring.  Use  appropriate  test 
images  to  see  if  the  results  are  identical. 

Exercise  9.4.  Show  that,  in  the  special  case  of  the  structuring  ele¬ 
ments  with  the  contents 


for  binary  and 


0 

0 

0 

0 

0 

0 

0 

0 

0 

dilation  is  equivalent  to  a  3  x  3  pixel  maximum  filter  and  erosion  is 
equivalent  to  a  3  x  3  pixel  minimum  filter  (see  Ch.  5,  Sec.  5.4.1). 

Exercise  9.5.  Thinning  can  be  applied  to  extract  the  “skeleton” 
of  a  binary  region,  which  in  turn  can  be  used  to  characterize  the 
shape  of  the  region.  A  common  approach  is  to  partition  the  skele¬ 
ton  into  a  graph,  consisting  of  nodes  and  connecting  segments,  as  a 
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Fig.  9.30 

Segmentation  of  a  region  skele¬ 
ton.  Original  binary  image  (a) 
and  the  skeleton  obtained  by 
thinning  (b).  Terminal  nodes 
are  marked  green,  connecting 
(inner)  nodes  are  marked  red. 


(a) 


(b) 


shape  representation  (see  Fig.  9.30  for  an  example).  Use  ImageJ’s 
skeletonize  ()  method  or  the  thin()  methode  of  class  Binary- 
MorphologyFilter  (see  Sec.  9.4.3)  to  generate  the  skeleton,  then 
locate  and  mark  the  connecting  and  terminal  nodes  of  this  struc¬ 
ture.  Define  precisely  the  properties  of  each  type  of  node  and  use 
this  definition  in  your  implementation.  Test  your  implementation  on 
different  examples.  How  would  you  generally  judge  the  robustness  of 
this  approach  as  a  2D  shape  representation? 
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10 

Regions  in  Binary  Images 


In  a  binary  image,  pixels  can  take  on  exactly  one  of  two  values. 
These  values  are  often  thought  of  as  representing  the  “foreground” 
and  “background”  in  the  image,  even  though  these  concepts  often 
are  not  applicable  to  natural  scenes.  In  this  chapter  we  focus  on 
connected  regions  in  images  and  how  to  isolate  and  describe  such 
structures. 

Let  us  assume  that  our  task  is  to  devise  a  procedure  for  finding 
the  number  and  type  of  objects  contained  in  an  image  as  shown  in 
Fig.  10.1.  As  long  as  we  continue  to  consider  each  pixel  in  isolation, 
we  will  not  be  able  to  determine  how  many  objects  there  are  overall  in 
the  image,  where  they  are  located,  and  which  pixels  belong  to  which 
objects.  Therefore  our  first  step  is  to  find  each  object  by  grouping 
together  all  the  pixels  that  belong  to  it.  In  the  simplest  case,  an 
object  is  a  group  of  touching  foreground  pixels,  that  is,  a  connected 
binary  region  or  “component”. 


Fig.  10.1 

Binary  image  with  nine  com¬ 
ponents.  Each  component  cor¬ 
responds  to  a  connected  region 
of  (black)  foreground  pixels. 
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10.1  Finding  Connected  Image  Regions 


In  the  search  for  binary  regions,  the  most  important  tasks  are  to  find 
out  which  pixels  belong  to  which  regions,  how  many  regions  are  in 
the  image,  and  where  these  regions  are  located.  These  steps  usually 
take  place  as  part  of  a  process  called  region  labeling  or  region  color¬ 
ing.  During  this  process,  neighboring  pixels  are  pieced  together  in 
a  stepwise  manner  to  build  regions  in  which  all  pixels  within  that 
region  are  assigned  a  unique  number  (“label”)  for  identification.  In 
the  following  sections,  we  describe  two  variations  on  this  idea.  In  the 
first  method,  region  marking  through  flood  filling ,  a  region  is  filled  in 
all  directions  starting  from  a  single  point  or  “seed”  within  the  region. 
In  the  second  method,  sequential  region  marking ,  the  image  is  tra¬ 
versed  from  top  to  bottom,  marking  regions  as  they  are  encountered. 
In  Sec.  10.2.2,  we  describe  a  third  method  that  combines  two  useful 
processes,  region  labeling  and  contour  finding,  in  a  single  algorithm. 

Independent  of  which  of  these  methods  we  use,  we  must  first  set¬ 
tle  on  either  the  4-  or  8-connected  definition  of  neighboring  (see  Ch. 
9,  Fig.  9.5)  for  determining  when  two  pixels  are  “connected”  to  each 
other,  since  under  each  definition  we  can  end  up  with  different  results. 
In  the  following  region-marking  algorithms,  we  use  the  following  con¬ 
vention:  the  original  binary  image  /(r,  v)  contains  the  values  0  and  1 
to  mark  the  background  and  foreground ,  respectively;  any  other  value 
is  used  for  numbering  (labeling)  the  regions,  that  is,  the  pixel  values 
are 


I(u,  v) 


background, 
foreground, 
region  label. 


(10.1) 


10.1.1  Region  Labeling  by  Flood  Filling 

The  underlying  algorithm  for  region  marking  by  flood  filling  is  simple: 
search  for  an  unmarked  foreground  pixel  and  then  fill  (visit  and  mark) 
all  the  rest  of  the  neighboring  pixels  in  its  region.  This  operation  is 
called  a  “flood  fill”  because  it  is  as  if  a  flood  of  water  erupts  at 
the  start  pixel  and  flows  out  across  a  flat  region.  There  are  various 
methods  for  carrying  out  the  fill  operation  that  ultimately  differ  in 
how  to  select  the  coordinates  of  the  next  pixel  to  be  visited  during 
the  fill.  We  present  three  different  ways  of  performing  the  Flood Fil I () 
procedure:  a  recursive  version,  an  iterative  depth-first  version,  and 
an  iterative  breadth-first  version  (see  Alg.  10.1): 

A.  Recursive  Flood  Filling:  The  recursive  version  (Alg.  10.1,  line 

8)  does  not  make  use  of  explicit  data  structures  to  keep  track 
of  the  image  coordinates  but  uses  the  local  variables  that  are 
implicitly  allocated  by  recursive  procedure  calls.1  Within  each 
region,  a  tree  structure,  rooted  at  the  starting  point,  is  defined 
by  the  neighborhood  relation  between  pixels.  The  recursive  step 
corresponds  to  a  depth-first  traversal  [54]  of  this  tree  and  results 
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1  In  Java,  and  similar  imperative  programming  languages  such  as  C  and 
C++,  local  variables  are  automatically  stored  on  the  call  stack  at  each 
procedure  call  and  restored  from  the  stack  when  the  procedure  returns. 


1: 

2 

3 

4 

5 

6 
7 

RegionLabeling(J) 

Input:  7,  an  integer- valued  image  with  initial  values  0  =  back¬ 
ground ,  1  =  foreground.  Returns  nothing  but  modifies  the  im¬ 
age  I. 

label  ^—2  >  value  of  the  next  label  to  be  assigned 

for  all  image  coordinates  a,  v  do 

if  7(a,  a )  =  1  then  t>  a  foreground  pixel 

Flood  Fill  (/,  a,  a,  label)  >  any  of  the  3  versions  below 

label  i —  label  -F  1. 

return 

8 

FloodFill(7,  a,  a,  label) 

>  Recursive  Version 

9 

if  a,  v  is  within  the  image  boundaries  and  7(a,  a)  =  1  then 

10 

7(a,  v)  <—  label 

11 

Flood  Fi  1 1  (T,  a+ 1,  a,  label) 

D>  recursive  call  to  FloodFill () 

12 

Flood  Fi  1 1  (T,  a,  a  +  1,  label) 

13 

Flood  Fi  1 1  (T,  a,  a  —  1,  label) 

14 

Flood  Fi  11(7,  u—  1,  a,  label) 

15 

return 

16 

FloodFill (7,  a,,  a,  label) 

>  Depth-First  Version 

17 

s  ( ) 

>  create  an  empty  stack  S 

18 

S  <—  (a,  a)  ^  S  > 

push  seed  coordinate  (a,  a)  onto  S 

19 

while  S  ^7  ( )  do 

>  while  S  is  not  empty 

20 

(x,y)  <—  GetFirst(S') 

21 

S  <—  Delete((x,  y) ,  S) 

>  pop  first  coordinate  off  the  stack 

22 

if  x,y  is  within  the  image  boundaries  and  I(x,y)  =  1  then 

23 

7(x,  y)  <—  label 

24 

S  <—  (x  +  1,  y)  ^  S 

>  push  (x  +  1  ,y)  onto  S 

25 

S  <—  (x,  y+1)  ^  S 

>  push  (x,y  + 1)  onto  S 

26 

S  <—  (x,y  —  1)  ^  S 

>  push  (x,y—l)  onto  S 

27 

S  <—  (x  —  1,  y)  ^  S 

>  push  (x  — 1  ,y)  onto  S 

28 

return 

29 

FloodFill (7,  a,  a,  label) 

D>  Breadth-First  Version 

30 

Q  ( ) 

>  create  an  empty  queue  Q 

31 

Q<-Q^(u,  a)  > 

append  seed  coordinate  (a,  a)  to  Q 

32 

while  Q  /  ()  do 

>  while  Q  is  not  empty 

33 

(x,y)  <—  GetFirst(Q) 

34 

Q  <-  Delete((x,  y)  ,Q) 

>  dequeue  first  coordinate 

35 

if  x,y  is  within  the  image  boundaries  and  I(x,y)  =  1  then 

36 

7(x,  y)  <—  label 

37 

Q  ^  (x  +  1,  y) 

>  append  (x  +  1  ,y)  to  Q 

38 

Q  ^  Q  ^  (x,  y+1) 

>  append  (x,y  + 1)  to  Q 

39 

Q  «-  Q'-'  (*,  y— l) 

>  append  (x,y  —  1)  to  Q 

40 

Q  <-  (x-l ,y) 

>  append  (x  — 1  ,y)  to  Q 

41 

return 

10.1  Finding  Connected 
Image  Regions 

Alg.  10.1 

Region  marking  by  flood  fill¬ 
ing.  The  binary  input  image 
I  uses  the  value  0  for  back¬ 
ground  pixels  and  1  for  fore¬ 
ground  pixels.  Unmarked  fore¬ 
ground  pixels  are  searched  for, 
and  then  the  region  to  which 
they  belong  is  filled.  Procedure 
Flood  Fi  1 1  ( )  is  defined  in  three 
different  versions:  recursive , 
enrphdepth-frrst  and  breadth- 
first. 


in  very  short  and  elegant  program  code.  Unfortunately,  since 
the  maximum  depth  of  the  recursion — and  thus  the  size  of  the 
required  stack  memory — is  proportional  to  the  size  of  the  region, 
stack  memory  is  quickly  exhausted.  Therefore  this  method  is 
risky  and  really  only  practical  for  very  small  images. 
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10  Regions  in  Binary  B.  Iterative  Flood  Filling  (depth- first):  Every  recursive  algo- 

Images  rithm  can  also  be  reformulated  as  an  iterative  algorithm  (Alg. 

10.1,  line  16)  by  implementing  and  managing  its  own  stacks.  In 
this  case,  the  stack  records  the  “open”  (that  is,  the  adjacent  but 
not  yet  visited)  elements.  As  in  the  recursive  version  (A),  the 
corresponding  tree  of  pixels  is  traversed  in  depth- first  order.  By 
making  use  of  its  own  dedicated  stack  (which  is  created  in  the 
much  larger  heap  memory),  the  depth  of  the  tree  is  no  longer 
limited  to  the  size  of  the  call  stack. 

C.  Iterative  Flood  Filling  (breadth- first):  In  this  version,  pixels 
are  traversed  in  a  way  that  resembles  an  expanding  wave  front 
propagating  out  from  the  starting  point  (Alg.  10.1,  line  29).  The 
data  structure  used  to  hold  the  as  yet  unvisited  pixel  coordinates 
is  in  this  case  a  queue  instead  of  a  stack,  but  otherwise  it  is 
identical  to  version  B. 

Java  implementation 

The  recursive  version  (A)  of  the  algorithm  is  an  exact  blueprint  of 
the  Java  implementation.  However,  a  normal  Java  runtime  environ¬ 
ment  does  not  support  more  than  about  10,000  recursive  calls  of  the 
Flood  Fi  1 1  ()  procedure  (Alg.  10.1,  line  8)  before  the  memory  allocated 
for  the  call  stack  is  exhausted.  This  is  only  sufficient  for  relatively 
small  images  with  fewer  than  approximately  200  x  200  pixels. 

Program  10.1  (line  1-17)  gives  the  complete  Java  implementation 
for  both  variants  of  the  iterative  Flood  Fill  ()  procedure.  The  stack  (S) 
in  the  depth-first  Version  (B)  and  the  queue  (Q)  in  the  breadth-first 
variant  (C)  are  both  implemented  as  instances  of  type  LinkedList.2 
<Point>  is  specified  as  a  type  parameter  for  both  generic  container 
classes  so  they  can  only  contain  objects  of  type  Point.3 

Figure  10.2  illustrates  the  progress  of  the  region  marking  in  both 
variants  within  an  example  region,  where  the  start  point  (i.e.,  seed 
point),  which  would  normally  lie  on  a  contour  edge,  has  been  placed 
arbitrarily  within  the  region  in  order  to  better  illustrate  the  process. 
It  is  clearly  visible  that  the  depth-first  method  first  explores  one 
direction  (in  this  case  horizontally  to  the  left)  completely  (that  is, 
until  it  reaches  the  edge  of  the  region)  and  only  then  examines  the 
remaining  directions.  In  contrast  the  breadth-first  method  markings 
proceed  outward,  layer  by  layer,  equally  in  all  directions. 

Due  to  the  way  exploration  takes  place,  the  memory  requirement 
of  the  breadth-first  variant  of  the  flood- fill  version  is  generally  much 
lower  than  that  of  the  depth- first  variant.  For  example,  when  flood 
filling  the  region  in  Fig.  10.2  (using  the  implementation  given  Prog. 
10.1),  the  stack  in  the  depth-first  variant  grows  to  a  maximum  of 
28,822  elements,  while  the  queue  used  by  the  breadth-first  variant 
never  exceeds  a  maximum  of  438  nodes. 

2  The  class  LinkedList  is  part  of  Java’s  collections  framework. 

Note  that  the  depth-first  and  breadth-first  implementations  in  Prog. 
10.1  typically  run  slower  than  the  recursive  version  described  in  Alg. 
10.1,  since  they  allocate  (and  immediately  discard)  large  numbers  of 
Point  objects.  A  better  solution  is  to  use  a  queue  or  stack  with  elements 
of  a  primitive  type  (e.g.,  int)  instead.  See  also  Exercise  10.3. 
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Depth- first  version  (using  a  stack): 


1 

void  f loodFill (int  u,  int  v,  int  label)  { 

2 

Deque<Point>  S  =  new  LinkedList<Point> () ; 

//  stack  S 

3 

S. push (new  Point (u,  v) ) ; 

4 

while  (! S . isEmpty () )  { 

5 

Point  p  =  S .pop() ; 

6 

int  x  =  p.x; 

7 

int  y  =  p.y; 

8 

if  ((x  >=  0)  &&  (x  <  width)  &&  (y  >=  0) 

&&  (y  <  height) 

9 

&&  ip .getPixel (x,  y)  ==  1)  { 

10 

ip .putPixel (x,  y,  label); 

11 

S. push (new  Point (x  +  1,  y)); 

12 

S.push(new  Point (x,  y  +  1)); 

13 

S.push(new  Point (x,  y  -  1)); 

14 

S. push (new  Point (x  -  1,  y)); 

15 

} 

16 

} 

17 

} 

10.1  Finding  Connected 
Image  Regions 

Prog.  10.1 

Java  implementation  of  iter¬ 
ative  flood  filling  ( depth-first 
and  breadth-first  variants). 


Breadth-first  version  (using  a  queue): 

18  void  f loodFill (int  u,  int  v,  int  label)  { 

19  Queue<Point>  Q  =  new  LinkedList<Point> ()  ;  //  queue  Q 

20  Q. add (new  Point (u,  v) ) ; 

21  while  ( ! Q . isEmpty () )  { 

22  Point  p  =  Q  .remove  () ;  //  get  the  next  point  to  process 

23  int  x  =  p.x; 

24  int  y  =  p.y; 

25  if  ((x  >=  0)  &&  (x  <  width)  &&  (y  >=  0)  &&  (y  <  height) 

26  &&  ip .getPixel (x,  y)  ==  1)  { 

27  ip .putPixel (x,  y,  label); 

28  Q. add (new  Point (x  +  1,  y)); 

29  Q. add (new  Point (x,  y  +  1)); 

30  Q. add (new  Point (x,  y  -  1)); 

31  Q. add (new  Point (x  -  1,  y)); 

32  } 

33  } 

34  } 


10.1.2  Sequential  Region  Labeling 

Sequential  region  marking  is  a  classical,  nonrecursive  technique  that 
is  known  in  the  literature  as  “region  labeling”.  The  algorithm  consists 
of  two  steps:  (1)  preliminary  labeling  of  the  image  regions  and  (2)  re¬ 
solving  cases  where  more  than  one  label  occurs  (i.e.,  has  been  assigned 
in  the  previous  step)  in  the  same  connected  region.  Even  though 
this  algorithm  is  relatively  complex,  especially  its  second  stage,  its 
moderate  memory  requirements  make  it  a  good  choice  under  limited 
memory  conditions.  However,  this  is  not  a  major  issue  on  modern 
computers  and  thus,  in  terms  of  overall  efficiency,  sequential  labeling 
offers  no  clear  advantage  over  the  simpler  methods  described  ear¬ 
lier.  The  sequential  technique  is  nevertheless  interesting  (not  only 
from  a  historic  perspective)  and  inspiring.  The  complete  process  is 
summarized  in  Alg.  10.2,  with  the  following  main  steps: 
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10  Regions  in  Binary 

Images 

Fig.  10.2 

Iterative  flood  filling — 
comparison  between  the 
depth-first  and  breadth-first 
approach.  The  starting  point, 
marked  +  in  the  top  two  im¬ 
age  (a),  was  arbitrarily  chosen. 

Intermediate  results  of  the 
flood  fill  process  after  1000 
(a),  5000  (b),  and  10,000  (c) 
marked  pixels  are  shown.  The 
image  size  is  250  X  242  pixels. 


(a) 

Original 


Depth-first  Breadth-first 


(c) 

K  =  10.000 


Step  1:  Initial  labeling 

In  the  first  stage  of  region  labeling,  the  image  is  traversed  from  top 
left  to  bottom  right  sequentially  to  assign  a  preliminary  label  to  ev¬ 
ery  foreground  pixel.  Depending  on  the  definition  of  neighborhood 
(either  4-  or  8-connected)  used,  the  following  neighbors  in  the  direct 
vicinity  of  each  pixel  must  be  examined  (x  marks  the  current  pixel 
at  the  position  (u,v)): 
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1: 


2: 

3: 

4: 

5: 

6: 

7: 

8: 

9: 

10: 

11: 

12: 

13: 

14: 

15: 

16: 

17: 


18: 

19: 

20: 

21: 

22: 

23: 

24: 


25: 

26: 

27: 

28: 

29: 

30: 

31: 


SequentialLabeling(J) 

Input:  /,  an  integer- valued  image  with  initial  values  0  =  back¬ 
ground ,  1  =  foreground.  Returns  nothing  but  modifies  the  im¬ 
age  I. 

Step  1  —  Assign  initial  labels: 

(M,  N)  <r-  Size(J) 

label  ^—2  >  value  of  the  next  label  to  be  assigned 

C  «—  ( )  >  empty  list  of  label  collisions 

for  v  <—  0, N  —  1  do 
for  u  <—  0, . . . ,  M  —  1  do 

if  I(u,v)  =  1  then  >  I(u,v)  is  a  foreground  pixel 

AT  <—  GetNeighbors(7,  u,  v)  t>  see  Eqn.  10.2 

if  Ni  =  0  for  all  £  Af  then 
I(u,  v)  <—  label, 
label  label  +  1. 

else  if  exactly  one  Nj  G  Af  has  a  value  1  then 
set  I(u,  v)  <—  Nj 

else  if  more  than  one  Nk  G  Af  have  values  >  1  then 
I(u,v )  Nk  >  select  one  Nk  >  1  as  the  new 

label 

for  all  Ni  G  A/*,  with  l  ^  k  and  Nt  >  1  do 

C  C  ^  (Nk,  Nt )  >  register  collision  (Nk:  Nt) 

Remark :  The  image  I  now  contains  labels  0,  2, ,  label—  1. 

Step  2  —  Resolve  label  collisions: 

Create  a  partitioning  of  the  label  set  (sequence  of  1-element  sets): 
R  ({2},  {3},  {4},  . . . ,  {label— 1}) 

for  all  collisions  (A,  B)  in  C  do 

Find  the  sets  R(a),  R (b)  holding  labels  A,  B\ 
a  <—  index  of  the  set  R(a)  that  contains  label  A 
b  <—  index  of  the  set  R (b)  that  contains  label  B 
if  a  7^  b  then  >  A  and  B  are  contained  in  different  sets 
R (a)  <—  R  (a)  U  R(b)  >  merge  elements  of  R(b)  into  R (a) 

R (b)  {} 

Remark:  All  equivalent  labels  (i.e.,  all  labels  of  pixels  in  the  same 
connected  component)  are  now  contained  in  the  same  subset  of  R. 


Step  3:  Relabel  the  image: 

for  all  (ujv)  G  M  x  N  do 

if  I(u,v)  >  1  then  >  this  is  a  labeled  foreground  pixel 

j  <—  index  of  the  set  R (j)  that  contains  label  I(u,v ) 
Choose  a  representative  element  k  from  the  set  R (j): 
k  <—  min(R(j))  t>  e.g.,  pick  the  minimum  value 

I(u,v)  <—  k  >  replace  the  image  label 

return 


10.1  Finding  Connected 
Image  Regions 

Alg.  10.2 

Sequential  region  labeling.  The 
binary  input  image  /  uses  the 
value  I(u,v )  =  0  for  back¬ 
ground  pixels  and  /(u,u)  =  1 
for  foreground  (region)  pixels. 
The  resulting  labels  have  the 
values  2,  .  .  .  ,  label  —  1. 


n3 

to 

w 

v4  = 

to 

X 

or 

A/g  — 

n4 

X 

N0 

n3  n5  n6  n7 


(10.2) 
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10  Regions  in  Binary  When  using  the  4-connected  neighborhood  A/4,  only  the  two  neigh- 

Images  hors  N1  =  I(u  —  l,v)  and  N2  =  I(u,v  —  1)  need  to  be  considered, 
but  when  using  the  8-connected  neighborhood  A/g,  all  four  neighbors 
N1 . . .  N4  must  be  examined.  In  the  following  examples  (Figs.  10.3- 
10.5),  we  use  an  8-connected  neighborhood  and  a  very  simple  test 
image  (Fig.  10.3(a))  to  demonstrate  the  sequential  region  labeling 
process. 

Propagating  region  labels 

Again  we  assume  that,  in  the  image,  the  value  /(r,  v)  =  0  represents 
background  pixels  and  the  value  I(u,v)  =  1  represents  foreground 
pixels.  We  will  also  consider  neighboring  pixels  that  he  outside  of 
the  image  matrix  (e.g.,  on  the  array  borders)  to  be  part  of  the  back¬ 
ground.  The  neighborhood  region  A f(u,  v )  is  slid  over  the  image 
horizontally  and  then  vertically,  starting  from  the  top  left  corner. 
When  the  current  image  element  I(u,v)  is  a  foreground  pixel,  it  is 
either  assigned  a  new  region  number  or,  in  the  case  where  one  of  its 
previously  examined  neighbors  in  A f(u,  v )  was  a  foreground  pixel,  it 
takes  on  the  region  number  of  the  neighbor.  In  this  way,  existing 
region  numbers  propagate  in  the  image  from  the  left  to  the  right  and 
from  the  top  to  the  bottom,  as  shown  in  (Fig.  10.3(b-c)). 

Label  collisions 

In  the  case  where  two  or  more  neighbors  have  labels  belonging  to 
different  regions,  then  a  label  collision  has  occurred;  that  is,  pixels 
within  a  single  connected  region  have  different  labels.  For  example, 
in  a  U-shaped  region,  the  pixels  in  the  left  and  right  arms  are  at 
first  assigned  different  labels  since  it  is  not  immediately  apparent 
that  they  are  actually  part  of  a  single  region.  The  two  labels  will 
propagate  down  independently  from  each  other  until  they  eventually 
collide  in  the  lower  part  of  the  “U”  (Fig.  10.3(d)). 

When  two  labels  a,  b  collide,  then  we  know  that  they  are  actually 
“equivalent”;  that  is,  they  are  contained  in  the  same  image  region. 
These  collisions  are  registered  but  otherwise  not  dealt  with  during 
the  first  step.  Once  all  collisions  have  been  registered,  they  are  then 
resolved  in  the  second  step  of  the  algorithm.  The  number  of  collisions 
depends  on  the  content  of  the  image.  There  can  be  only  a  few  or  very 
many  collisions,  and  the  exact  number  is  only  known  at  the  end  of  the 
first  step,  once  the  whole  image  has  been  traversed.  For  this  reason, 
collision  management  must  make  use  of  dynamic  data  structures  such 
as  lists  or  hash  tables. 

Upon  the  completion  of  the  first  steps,  all  the  original  foreground 
pixels  have  been  provisionally  marked,  and  all  the  collisions  between 
labels  within  the  same  regions  have  been  registered  for  subsequent 
processing.  The  example  in  Fig.  10.4  illustrates  the  state  upon  com¬ 
pletion  of  step  1:  all  foreground  pixels  have  been  assigned  preliminary 
labels  (Fig.  10.4(a)),  and  the  following  collisions  (depicted  by  circles) 
between  the  labels  (2,4),  (2,5),  and  (2,6)  have  been  registered.  The 
labels  C  =  {2,  3, 4,  5,  6,  7}  and  collisions  C  =  {(2, 4) ,  (2,  5) ,  (2,  6)} 
correspond  to  the  nodes  and  edges  of  an  undirected  graph  (Fig. 
10.4(b)). 


216 


0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

1 

0 

0 

1 

1 

0 

1 

0 

0 

1 

1 

1 

1 

1 

1 

0 

0 

1 

0 

0 

1 

0 

0 

0 

0 

0 

1 

0 

1 

0 

0 

0 

0 

0 

1 

0 

0 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

0 

0 

0 

0 

0 

1 

1 

1 

1 

1 

1 

1 

1 

1 

0 

0 

1 

1 

0 

0 

0 

1 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

Background 


Foreground 


(b)  Background  neighbors  only 


0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

1 

0 

0 

1 

1 

0 

1 

0 

0 

1 

1 

1 

1 

1 

1 

0 

0 

1 

0 

0 

1 

0 

0 

0 

0 

0 

1 

0 

1 

0 

0 

0 

0 

0 

1 

0 

0 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

0 

0 

0 

0 

0 

1 

1 

1 

1 

1 

1 

1 

1 

1 

0 

0 

1 

1 

0 

0 

0 

1 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

New  label  (2) 


X 

x 

\ 


0 

0 

0 

0 

0 

& 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

2 

1 

0 

0 

1 

1 

0 

1 

0 

0 

1 

1 

1 

1 

1 

1 

0 

0 

1 

0 

0 

1 

0 

0 

0 

0 

0 

1 

0 

1 

0 

0 

0 

0 

0 

1 

0 

0 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

0 

0 

0 

0 

0 

1 

1 

1 

1 

1 

1 

1 

1 

1 

0 

0 

1 

1 

0 

0 

0 

1 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

10.1  Finding  Connected 
Image  Regions 


Fig.  10.3 

Sequential  region  labeling — 
label  propagation.  Original 
image  (a).  The  first  foreground 
pixel  (marked  1)  is  found  in 
(b):  all  neighbors  are  back¬ 
ground  pixels  (marked  0),  and 
the  pixel  is  assigned  the  first 
label  (2).  In  the  next  step  (c), 
there  is  exactly  one  neighbor 
pixel  marked  with  the  label  2, 
so  this  value  is  propagated.  In 
(d)  there  are  two  neighboring 
pixels,  and  they  have  differing 
labels  (2  and  5);  one  of  these 
values  is  propagated,  and  the 
collision  (2,5)  is  registered. 


(c)  Exactly  one  neighbor  label 
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(d)  Two  different  neighbor  labels 


One  of  the  labels  (2)  is  propagated 
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Fig.  10.4 

Sequential  region  labeling — 
intermediate  result  after 
step  1.  Label  collisions  indi¬ 
cated  by  circles  (a);  the  nodes 
of  the  undirected  graph  (b) 
correspond  to  the  labels,  and 
its  edges  correspond  to  the 
collisions. 


Step  2:  Resolving  label  collisions 

The  task  in  the  second  step  is  to  resolve  the  label  collisions  that  arose 
in  the  first  step  in  order  to  merge  the  corresponding  “partial”  regions. 
This  process  is  nontrivial  since  it  is  possible  for  two  regions  with  dif- 
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10  Regions  in  Binary  ferent  labels  to  be  connected  transitively  (e.g.,  (a,  b )  n  (b,  c)  =^>  (a,  c) ) 

Images  through  a  third  region  or,  more  generally,  through  a  series  of  regions. 

In  fact,  this  problem  is  identical  to  the  problem  of  finding  the  con¬ 
nected  components  of  a  graph  [54],  where  the  labels  C  determined  in 
step  1  constitute  the  “nodes”  of  the  graph  and  the  registered  colli¬ 
sions  C  make  up  its  “edges”  (Fig.  10.4(b)). 

Once  all  the  distinct  labels  within  a  single  region  have  been  col¬ 
lected,  the  labels  of  all  the  pixels  in  the  region  are  updated  so  they 
carry  the  same  label  (e.g.,  choosing  the  smallest  label  number  in  the 
region),  as  depicted  in  Fig.  10.5.  Figure  10.6  shows  the  complete  seg¬ 
mentation  with  some  region  statistics  that  can  be  easily  calculated 
from  the  labeling  data. 


Fig.  10.5 

Sequential  region  labeling — 
final  result  after  step  2.  All 
equivalent  labels  have  been 
replaced  by  the  smallest 
label  within  that  region. 
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Fig.  10.6 

Example  of  a  complete  region 
labeling.  The  pixels  within 
each  region  have  been  col¬ 
ored  according  to  the  consec¬ 
utive  label  values  2,  3,  .  .  .  ,  10 
they  were  assigned.  The  cor¬ 
responding  region  statistics 
are  shown  in  the  table  (total 
image  size  is  1212  x  836). 


Label 

Area 
( pixels ) 

Bounding  Box 
(left,  top ,  right ,  bottom ) 

Centroid 

(^C5  Vc ) 

2 

14978 

(887,  21,  1144,  399) 

(1049.7,  242.8) 

3 

36156 

(  40,  37,  438,  419) 

(  261.9,  209.5) 

4 

25904 

(464,  126,  841,  382) 

(  680.6,  240.6) 

5 

2024 

(387,  281,  442,  341) 

(  414.2,  310.6) 

6 

2293 

(244,  367,  342,  506) 

(  294.4,  439.0) 

7 

4394 

(406,  400,  507,  512) 

(  454.1,  457.3) 

8 

29777 

(510,  416,  883,  765) 

(  704.9,  583.9) 

9 

20724 

(833,  497,  1168,  759) 

(1016.0,  624.1) 

10 

16566 

(  82,  558,  411,  821) 

(  208.7,  661.6) 
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10.1.3  Region  Labeling — Summary 


10.2  Region  Contours 


In  this  section,  we  have  described  a  selection  of  algorithms  for  finding 
and  labeling  connected  regions  in  images.  We  discovered  that  the 
elegant  idea  of  labeling  individual  regions  using  a  simple  recursive 
flood-filling  method  (Sec.  10.1.1)  was  not  useful  because  of  practical 
limitations  on  the  depth  of  recursion  and  the  high  memory  costs  as¬ 
sociated  with  it.  We  also  saw  that  classical  sequential  region  labeling 
(Sec.  10.1.2)  is  relatively  complex  and  offers  no  real  advantage  over 
iterative  implementations  of  the  depth-first  and  breadth-first  meth¬ 
ods.  In  practice,  the  iterative  breadth-first  method  is  generally  the 
best  choice  for  large  and  complex  images.  In  the  following  section 
we  present  a  modern  and  efficient  algorithm  that  performs  region 
labeling  and  also  delineates  the  regions’  contours.  Since  contours 
are  required  in  many  applications,  this  combined  approach  is  highly 
practical. 


10.2  Region  Contours 

Once  the  regions  in  a  binary  image  have  been  found,  the  next  step  is 
often  to  find  the  contours  (that  is,  the  outlines)  of  the  regions.  Like 
so  many  other  tasks  in  image  processing,  at  first  glance  this  appears 
to  be  an  easy  one:  simply  follow  along  the  edge  of  the  region.  We  will 
see  that,  in  actuality,  describing  this  apparently  simple  process  algo¬ 
rithmically  requires  careful  thought,  which  has  made  contour  finding 
one  of  the  classic  problems  in  image  analysis. 

10.2.1  External  and  Internal  Contours 

As  we  discussed  in  Chapter  9,  Sec.  9.2.7,  the  pixels  along  the  edge 
of  a  binary  region  (i.e.,  its  border)  can  be  identified  using  simple 
morphological  operations  and  difference  images.  It  must  be  stressed, 
however,  that  this  process  only  marks  the  pixels  along  the  contour, 
which  is  useful,  for  instance,  for  display  purposes.  In  this  section,  we 
will  go  one  step  further  and  develop  an  algorithm  for  obtaining  an 
ordered  sequence  of  border  pixel  coordinates  for  describing  a  region’s 
contour.  Note  that  connected  image  regions  contain  exactly  one  outer 
contour,  yet,  due  to  holes,  they  can  contain  arbitrarily  many  inner 
contours.  Within  such  holes,  smaller  regions  may  be  found,  which 
will  again  have  their  own  outer  contours,  and  in  turn  these  regions 
may  themselves  contain  further  holes  with  even  smaller  regions,  and 
so  on  in  a  recursive  manner  (Fig.  10.7).  An  additional  complication 
arises  when  regions  are  connected  by  parts  that  taper  down  to  the 
width  of  a  single  pixel.  In  such  cases,  the  contour  can  run  through  the 
same  pixel  more  than  once  and  from  different  directions  (Fig.  10.8). 
Therefore,  when  tracing  a  contour  from  a  start  point  xsl  returning 
to  the  start  point  is  not  a  sufficient  condition  for  terminating  the 
contour-tracing  process.  Other  factors,  such  as  the  current  direction 
along  which  contour  points  are  being  traversed,  must  be  taken  into 
account. 

One  apparently  simple  way  of  determining  a  contour  is  to  proceed 
in  analogy  to  the  two-stage  process  presented  in  Sec.  10.1;  that  is, 


219 


Outer  contour 
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Images 


Fig.  10.7 

Binary  image  with  outer  and 
inner  contours.  The  outer  con¬ 
tour  lies  along  the  outside  of 
the  foreground  region  (dark). 
The  inner  contour  surrounds 
the  space  within  the  region, 
which  may  contain  further 
regions  (holes),  and  so  on. 


Inner  contour 


(a) 


(b) 


Fig.  10.8 

The  path  along  a  contour  as 
an  ordered  sequence  of  pixel 
coordinates  with  a  given  start 
point  xs.  Individual  pixels 
may  occur  (be  visited)  more 
than  once  within  the  path, 
and  a  region  consisting  of  a 
single  isolated  pixel  will  also 
have  a  contour  (bottom  right). 


to  first  identify  the  connected  regions  in  the  image  and  second ,  for 
each  region,  proceed  around  it,  starting  from  a  pixel  selected  from  its 
border.  In  the  same  way,  an  internal  contour  can  be  found  by  starting 
at  a  border  pixel  of  a  region’s  hole.  A  wide  range  of  algorithms  based 
on  first  finding  the  regions  and  then  following  along  their  contours 
have  been  published,  including  [202],  [180,  pp.  142-148],  and  [214,  p. 
296]. 

As  a  modern  alternative,  we  present  the  following  combined  al¬ 
gorithm  that,  in  contrast  to  the  aforementioned  classical  methods, 
combines  contour  finding  and  region  labeling  in  a  single  process. 

10.2.2  Combining  Region  Labeling  and  Contour  Finding 

This  method,  based  on  [47],  combines  the  concepts  of  sequential  re¬ 
gion  labeling  (Sec.  10.1)  and  traditional  contour  tracing  into  a  single 
algorithm  able  to  perform  both  tasks  simultaneously  during  a  single 
pass  through  the  image.  It  identifies  and  labels  regions  and  at  the 
same  time  traces  both  their  inner  and  outer  contours.  The  algorithm 
does  not  require  any  complicated  data  structures  and  is  relatively 
efficient  when  compared  to  other  methods  with  similar  capabilities. 
The  key  steps  of  this  method  are  described  here  and  illustrated  in 
Fig.  10.9: 

1.  As  in  the  sequential  region  labeling  (Alg.  10.2),  the  binary  image 
I  is  traversed  from  the  top  left  to  the  bottom  right.  Such  a  traver¬ 
sal  ensures  that  all  pixels  in  the  image  are  eventually  examined 
and  assigned  an  appropriate  label. 
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(a) 


B 


(e) 


A 


(b) 


(d) 


B 


(f) 


(h) 


Fig.  10.9 

Combined  region  labeling  and 
contour  following  (after  [47]). 
The  image  in  (a)  is  traversed 
from  the  top  left  to  the  lower 
right,  one  row  at  a  time.  In 
(b),  the  first  foreground  pixel 
A  on  the  outer  edge  of  the  re¬ 
gion  is  found.  Starting  from 
point  A,  the  pixels  on  the  edge 
along  the  outer  contour  are 
visited  and  labeled  until  A 
is  reached  again  (c).  Labels 
picked  up  at  the  outer  contour 
are  propagated  along  the  im¬ 
age  line  inside  the  region  (d). 

In  (e),  B  was  found  as  the  first 
point  on  the  inner  contour. 
Now  the  inner  contour  is  tra¬ 
versed  in  clock-wise  direction, 
marking  the  contour  pixels 
until  point  B  is  reached  again 
(f).  The  same  tracing  process 
is  used  as  in  step  (c),  with 
the  inside  of  the  region  always 
lying  to  the  right  of  the  con¬ 
tour  path.  In  (g)  a  previously 
marked  point  C  on  an  inner 
contour  is  detected.  Its  label  is 
again  propagated  along  the  im¬ 
age  line  inside  the  region.  The 
final  result  is  shown  in  (h). 
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10  Regions  in  Binary  2.  At  a  given  position  in  the  image,  the  following  cases  may  occur: 

Images  Case  A:  The  transition  from  a  background  pixel  to  a  previously 

unmarked  foreground  pixel  means  that  this  pixel  lies  on  the  outer 
edge  of  a  new  region.  A  new  label  is  assigned  and  the  associated 
outer  contour  is  traversed  and  marked  by  calling  the  method 
TraceContour  (see  Alg.  10.3  and  Fig.  10.9(a)).  Furthermore,  all 
background  pixels  directly  bordering  the  region  are  marked  with 
the  special  label  —1. 

Case  B:  The  transition  from  a  foreground  pixel  B  to  an  un¬ 
marked  background  pixel  means  that  this  pixel  lies  on  an  inner 
contour  (Fig.  10.9(b)).  Starting  from  B ,  the  inner  contour  is  tra¬ 
versed  and  its  pixels  are  marked  with  labels  from  the  surrounding 
region  (Fig.  10.9(c)).  Also,  all  bordering  background  pixels  are 
again  assigned  the  special  label  value  —1. 

Case  C:  When  a  foreground  pixel  does  not  he  on  a  contour,  then 
the  neighboring  pixel  to  the  left  has  already  been  labeled  (Fig. 
10.9(d))  and  this  label  is  propagated  to  the  current  pixel. 

In  Algs.  10.3-10.4,  the  entire  procedure  is  presented  again  and  ex¬ 
plained  precisely.  Procedure  Region ContourLabeling  traverses  the  im¬ 
age  line-by-line  and  calls  procedure  TraceContour  whenever  a  new 
inner  or  outer  contour  must  be  traced.  The  labels  of  the  image  ele¬ 
ments  along  the  contour,  as  well  as  the  neighboring  foreground  pixels, 
are  stored  in  the  “label  map”  L  (a  rectangular  array  of  the  same  size 
as  the  image)  by  procedure  FindNextContourPoint  in  Alg.  10.4. 


10.2.3  Java  Implementation 

The  Java  implementation  of  the  combined  region  labeling  and  con¬ 
tour  tracing  algorithm  can  be  found  online  in  class  RegionContour- 
Labeling4  (for  details  see  Sec.  10.9).  It  almost  exactly  follows  Algs. 
10.3-10.4,  only  the  image  I  and  the  associated  label  map  L  are  ini¬ 
tially  padded  (i.e.,  enlarged)  by  a  surrounding  layer  of  background 
pixels.  This  simplifies  the  process  of  tracing  the  outer  region  con¬ 
tours,  since  no  special  treatment  is  needed  at  the  image  borders. 
Program  10.2  shows  a  minimal  example  of  its  usage  within  the  run() 
method  of  an  ImageJ  plugin  (class  Trace_Contours). 


Examples 

This  combined  algorithm  for  region  marking  and  contour  following 
is  particularly  well  suited  for  processing  large  binary  images  since  it 
is  efficient  and  has  only  modest  memory  requirements.  Figure  10.10 
shows  a  synthetic  test  image  that  illustrates  a  number  of  special  situ¬ 
ations,  such  as  isolated  pixels  and  thin  sections,  which  the  algorithm 
must  deal  with  correctly  when  following  the  contours.  In  the  re¬ 
sulting  plot,  outer  contours  are  shown  as  black  polygon  lines  running 
trough  the  centers  of  the  contour  pixels,  and  inner  contours  are  drawn 
white.  Contours  of  single-pixel  regions  are  marked  by  small  circles 
filled  with  the  corresponding  color.  Figure  10.11  shows  the  results 
for  a  larger  section  taken  from  a  real  image  (Fig.  9.12). 


Package  imagingbook. pub. regions. 
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1: 

RegionContourLabeling(J) 

Input:  /,  a  binary  image  with  0  = 

background ,  1  =  foreground. 

Returns  sequences  of  outer  and 
region  labels. 

inner  contours  and  a  map  of 

2 

(M,  N)  <-  Size(7) 

3 

Qut  ( ) 

>  empty  list  of  outer  contours 

4 

Qn  (  ) 

D>  empty  list  of  inner  contours 

5 

Create  map  L:  M  xiVuZ 

>  create  the  label  map  L 

6 

for  all  (u,  v )  do 

7 

L(r,  v)  0 

>  initialize  L  to  zero 

8 

r  <—  0 

>  region  counter 

9 

for  v  <—  0, . . . ,  N—  1  do  > 

scan  the  image  top  to  bottom 

10 

label  <—  0 

11 

for  u  <—  0, . . . ,  M—  1  do 

t>  scan  the  image  left  to  right 

12 

if  I(u,  v)  >  0  then 

>  I(u,v)  is  a  foreground  pixel 

13 

if  ( label  /  0)  then 

>  continue  existing  region 

14 

L (u,  v )  <—  label 

15 

else 

16 

label  <—  L(r,  v) 

17 

if  ( label  =  0)  then 

>  hit  a  new  outer  contour 

18 

r  <—  r  +  1 

19 

label  <—  r 

20 

xs  <—  (u,  V ) 

21 

C  TraceContour(tcs,  0,  label ,  /,  L)  D>  outer  c. 

22 

^*out  ^  ^*out  w  ^ 

[C)  >  collect  outer  contour 

23 

L (u,  v )  <—  label 

24 

else 

>  I(u,v )  is  a  background  pixel 

25 

if  ( label  /  0)  then 

26 

if  (L(r,t)  =  0)  then  >  hit  new  inner  contour 

27 

xs  <—  (u—  1,  v) 

28 

C  <—  TraceContour(tcs,  1,  label ,  /,  L)  >  inner 

cntr. 

29 

Qn  Qn  w  (C)  >  collect  inner  contour 

30 

label  <—  0 

31 

return  (Cout,Cin,L) 

continued  in  Alg.  10.4  t>t> 

10.2  Region  Contours 

Alg.  10.3 

Combined  contour  tracing  and 
region  labeling  (part  1).  Given 
a  binary  image  /,  the  applica¬ 
tion  of  RegionContourLabeling(J) 
returns  a  set  of  contours  and 
an  array  containing  region  la¬ 
bels  for  all  pixels  in  the  image. 
When  a  new  point  on  either 
an  outer  or  inner  contour  is 
found,  then  an  ordered  list  of 
the  contour’s  points  is  con¬ 
structed  by  calling  procedure 
TraceContour  (line  21  and  line 
28).  TraceContour  itself  is  de¬ 
scribed  in  Alg.  10.4. 


1 

L 

— r 

■1  b 

- 

...  .. 

-■ 

■  H  |  H 1 

■mJ  bmL 

hJ  bhb  hhb  L 

Fig.  10.10 

Combined  contour  and  region 
marking.  Original  image,  with 
foreground  pixels  marked  green 
(a);  located  contours  with 
black  lines  for  outer  and  white 
lines  for  inner  contours  (b). 
Contour  polygons  pass  through 
the  pixel  centers.  Outer  con¬ 
tours  of  single-pixel  regions 
(e.g.,  in  the  upper-right  of  (b)) 
are  marked  by  a  single  dot. 
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Alg.  10.4 

Combined  contour  finding 
and  region  labeling  (part  2, 
continued  from  Alg.  10.3). 
Starting  from  xs ,  the  proce¬ 
dure  TraceContour  traces  along 
the  contour  in  the  direction 
ds  =  0  for  outer  contours  or 
ds  =  1  for  inner  contours. 
During  this  process,  all  con¬ 
tour  points  as  well  as  neigh¬ 
boring  background  points  are 
marked  in  the  label  array  L. 
Given  a  point  xc ,  TraceContour 
uses  FindNextContourPointQ 
to  determine  the  next  point 
along  the  contour  (line  9). 
The  function  Delta  ()  returns 
the  next  coordinate  in  the 
sequence,  taking  into  ac¬ 
count  the  search  direction  d. 


Prog.  10.2 

Example  of  using  the  class 
ContourTracer.  (plugin 
Trace_Contours).  First  (in 
line  9)  a  new  instance  of 
RegionContourLabeling  is  cre¬ 
ated  for  the  input  image  I. 
The  segmentation  into  re¬ 
gions  and  contours  is  done 
by  the  constructor.  In  lines 
11—12  the  outer  and  inner  con¬ 
tours  are  retrieved  as  (possibly 
empty)  lists  of  type  Contour. 
Finally,  the  list  of  connected 
regions  is  obtained  in  line  14. 


1: 


2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 


TraceContour (tcs,  ds,  label ,  /,  L) 

Input:  xs,  start  position;  ds,  initial  search  direction;  label ,  the 
label  assigned  to  this  contour;  /,  the  binary  input  image;  L,  label 
map.  Returns  a  new  outer  or  inner  contour  (sequence  of  points) 
starting  at  xs. 

(x,d)  4—  FindNextContourPoint(tcs,  ds,  /,  L) 

c  <—  (x)  >  new  contour  with  the  single  point  x 

xp  <—  xs  >  previous  position  xp  =  (up,vp) 

xc  <—  x  t>  current  position  xc  =  (ucl  vc) 

done  <—  (xs  =  x)  >  isolated  pixel? 

while  (-i done )  do 
L(rc,  vc)  label 

(xn:  d)  <—  FindNextContourPoint(a?c,  (d  +  6)  mod  8,  /,  L) 
xp  <-  xc 
xc  e —  £cn 

done  <—  {xp  =  xs  A  xc  =  x)  >  back  at  starting  position? 
if  (-1 done )  then 

c  <—  c  ^  (xn)  >  add  point  xn  to  contour  c 

return  c  >  return  this  contour 


16:  FindNextContourPoint(a;,  d,  /,  L) 

Input:  x ,  initial  position;  d  £  [0,7],  search  direction,  /,  binary 
input  image;  L,  the  label  map. 

Returns  the  next  point  on  the  contour  and  the  modified  search 
direction. 

17:  for  i  i —  0, . . . ,  6  do  >  search  in  7  directions 

18:  xn  x  +  Delta(d) 

19:  if  I(x n)  =  0  then  >  I(un,v n)  is  a  background  pixel 

20:  L(ccn)  < - 1  >  mark  background  as  visited  (— 1) 

21:  d  (d  +  1)  mod  8 

22:  else  >  found  a  non-background  pixel  at  xn 

23:  return  (xn,d) 

24:  return  (x,d)  >  found  no  next  node,  return  start  position 


25: 


Delta(d)  := 


with 
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import  imagingbook . pub . regions . BinaryRegion ; 
import  imagingbook . pub . regions . Contour ; 
import  imagingbook . pub . regions . RegionContourLabeling ; 
import  j ava. util .List ; 

public  void  run(ImageProcessor  ip)  { 

//  Make  sure  we  have  a  proper  byte  image: 

ByteProcessor  I  =  ip . convertToByteProcessor () ; 

//  Create  the  region  labeler  /  contour  tracer: 

RegionContourLabeling  seg  =  new  RegionContourLabeling (I) ; 

//  Get  all  outer/inner  contours  and  connected  regions: 

List<Contour>  outerContours  =  seg. getOuterContours () ; 
List<Contour>  innerContours  =  seg. get InnerCont ours () ; 
List<BinaryRegion>  regions  =  seg . getRegions () ; 


} 
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10.3  Representing 
Image  Regions 


Fig.  10.11 

Example  of  a  complex  con¬ 
tour  (original  image  in  Ch. 

9,  Fig.  9.12).  Outer  contours 
are  marked  in  black  and  inner 
contours  in  white. 


10.3  Representi  ng  Image  Regions 

10.3.1  Matrix  Representation 

A  natural  representation  for  images  is  a  matrix  (i.e.,  a  two-dimensional 
array)  in  which  elements  represent  the  intensity  or  the  color  at  a  cor¬ 
responding  position  in  the  image.  This  representation  lends  itself,  in 
most  programming  languages,  to  a  simple  and  elegant  mapping  onto 
two-dimensional  arrays,  which  makes  possible  a  very  natural  way  to 
work  with  raster  images.  One  possible  disadvantage  with  this  rep¬ 
resentation  is  that  it  does  not  depend  on  the  content  of  the  image. 
In  other  words,  it  makes  no  difference  whether  the  image  contains 
only  a  pair  of  lines  or  is  of  a  complex  scene  because  the  amount  of 
memory  required  is  constant  and  depends  only  on  the  dimensions  of 
the  image. 

Regions  in  an  image  can  be  represented  using  a  logical  mask  in 
which  the  area  within  the  region  is  assigned  the  value  true  and  the 
area  without  the  value  false  (Fig.  10.12).  Since  these  values  can  be 
represented  by  a  single  bit,  such  a  matrix  is  often  referred  to  as  a 
“bitmap”.5 

10.3.2  Run  Length  Encoding 

In  run  length  encoding  (RLE),  sequences  of  adjacent  foreground  pix¬ 
els  can  be  represented  compactly  as  “runs”.  A  run,  or  contiguous 

5  Java  does  not  provide  a  genuine  1-bit  data  type.  Even  variables  of 
type  boolean  are  represented  internally  (i.e.,  within  the  Java  virtual 
machine)  as  32-bit  ints. 
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Images 


Fig.  10.12 

Use  of  a  binary  mask  to 
specify  a  region  of  an  im¬ 
age:  original  image  (a), 
logical  (bit)  mask  (b), 
and  masked  image  (c). 


(a) 


(b) 


(c) 


block,  is  a  maximal  length  sequence  of  adjacent  pixels  of  the  same 
type  within  either  a  row  or  a  column.  Runs  of  arbitrary  length  can 
be  encoded  compactly  using  three  integers, 

Runi  =  ( row ^  column length ^), 

as  illustrated  in  Fig.  10.13.  When  representing  a  sequence  of  runs 
within  the  same  row,  the  number  of  the  row  is  redundant  and  can  be 
left  out.  Also,  in  some  applications,  it  is  more  useful  to  record  the 
coordinate  of  the  end  column  instead  of  the  length  of  the  run. 


Fig.  10.13 

Run  length  encoding  in  row 
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Since  the  RLE  representation  can  be  easily  implemented  and  ef¬ 
ficiently  computed,  it  has  long  been  used  as  a  simple  lossless  com¬ 
pression  method.  It  forms  the  foundation  for  fax  transmission  and 
can  be  found  in  a  number  of  other  important  codecs,  including  TIFF, 
GIF,  and  JPEG.  In  addition,  RLE  provides  precomputed  information 
about  the  image  that  can  be  used  directly  when  computing  certain 
properties  of  the  image  (for  example,  statistical  moments;  see  Sec. 
10.5.2). 


10.3.3  Chain  Codes 

Regions  can  be  represented  not  only  using  their  interiors  but  also  by 
their  contours.  Chain  codes,  which  are  often  referred  to  as  Freeman 
codes  [79],  are  a  classical  method  of  contour  encoding.  In  this  encod¬ 
ing,  the  contour  beginning  at  a  given  start  point  xs  is  represented  by 
the  sequence  of  directional  changes  it  describes  on  the  discrete  image 
grid  (Fig.  10.14). 

Absolute  chain  code 

For  a  closed  contour  of  a  region  1Z,  described  by  the  sequence  of 
points  Cji  =  (x0,  x1,  . . .  xm_1)  with  xi  =  (ui,vi),  we  create  the 
elements  of  its  chain  code  sequence  c'n  =  (cq,  , . . .  c'M_1 )  with 
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(a)  4-Chain  Code 
3223222322303303. . .111 

length  =  28 


(b)  8-Chain  Code 
54544546767. . .222 
length  =  16  +  6\/2  ~  24.5 


c[  =  Code(id,  A), 


(10.3) 


where 


(^i+l  ^ i 7  D+ 1  D) 

(u0-Ui,V  0-Vi) 


for  0  <  i  <  M  —  1, 
for  i  =  M  —  1 , 


(10.4) 


and  Code(n/,  A)  being  defined  (assuming  an  8-connected  neighbor¬ 
hood)  by  the  following  table: 


u' 

1 

1 

0 

-1 

-1 

-1 

0 

1 

v' 

0 

1 

1 

1 

0 

-1 

-1 

-1 

Cod e(u\  v') 
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6 

7 

Chain  codes  are  compact  since  instead  of  storing  the  absolute  coor¬ 
dinates  for  every  point  on  the  contour,  only  that  of  the  starting  point 
is  recorded.  The  remaining  points  are  encoded  relative  to  the  start¬ 
ing  point  by  indicating  in  which  of  the  eight  possible  directions  the 
next  point  lies.  Since  only  3  bits  are  required  to  encode  these  eight 
directions  the  values  can  be  stored  using  a  smaller  numeric  type. 


Differential  chain  code 

Directly  comparing  two  regions  represented  using  chain  codes  is  dif¬ 
ficult  since  the  description  depends  on  the  starting  point  selected  xs, 
and  for  instance  simply  rotating  the  region  by  90°  results  in  a  com¬ 
pletely  different  chain  code.  When  using  a  differential  chain  code, 
the  situation  improves  slightly.  Instead  of  encoding  the  difference  in 
the  position  of  the  next  contour  point,  the  change  in  the  direction 
along  the  discrete  contour  is  encoded.  A  given  absolute  chain  code 
c'n  =  (cq,  c'i,  . . .  c'm-i)  can  be  converted  element  by  element  to  a 
differential  chain  code  c'f  =  (eg,  c",  . . .  c^-_1),  with6 


(d+i  -  c')  mod  8 
(c'0  —  c')  mod  8 


for  0  <  i  <  M  —  1, 
for  i  =  M  —  1 , 


(10.5) 


For  the  implementation  of  the  mod  operator  see  Sec.  F.1.2  in  the 
Appendix. 


10.3  Representing 
Image  Regions 

Fig.  10.14 

Chain  codes  with  4-  and  8- 
connected  neighborhoods.  To 
compute  a  chain  code,  be¬ 
gin  traversing  the  contour 
from  a  given  starting  point 
xs.  Encode  the  relative  posi¬ 
tion  between  adjacent  contour 
points  using  the  directional 
code  for  either  4-connected 
(left)  or  8-connected  (right) 
neighborhoods.  The  length  of 
the  resulting  path,  calculated 
as  the  sum  of  the  individual 
segments,  can  be  used  to  ap¬ 
proximate  the  true  length  of 
the  contour. 
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10  Regions  in  Binary  again  under  the  assumption  of  an  8-connected  neighborhood.  The 

Images  element  d-  thus  describes  the  change  in  direction  (curvature)  of  the 
contour  between  two  successive  segments  d-  and  c'+1  of  the  original 
chain  code  c^.  For  the  contour  in  Fig.  10.14(b),  for  example,  the 
result  is 


dn  =  (5,4, 5, 4, 4, 5, 4, 6, 7, 6, 7,  2, 2), 

cn  =  (7, 1,7,0, 1,7,2, 1,7, 1, 1, . .  .,0,0,3). 

Given  the  start  position  xs  and  the  (absolute)  initial  direction  c0, 
the  original  contour  can  be  unambiguously  reconstructed  from  the 
differential  chain  code. 

Shape  numbers 

While  the  differential  chain  code  remains  the  same  when  a  region  is 
rotated  by  90°,  the  encoding  is  still  dependent  on  the  selected  starting 
point.  If  we  want  to  determine  the  similarity  of  two  contours  of  the 
same  length  M  using  their  differential  chain  codes  c",  c2,  we  must 
first  ensure  that  the  same  start  point  was  used  when  computing  the 
codes.  A  method  that  is  often  used  [15,88]  is  to  interpret  the  elements 
d'  in  the  differential  chain  code  as  the  digits  of  a  number  to  the  base  b 
(6  =  8  for  an  8-connected  contour  or  b  =  4  for  a  4-connected  contour) 
and  the  numeric  value 

M—l 

Val(c^)  =  Co  *6°  +  d['bx  +  ...  +  d'M_x  -bM~x  =  c" -b1 .  (10.6) 

i= o 

Then  the  sequence  is  shifted  circularly  until  the  numeric  value  of 
the  corresponding  number  reaches  a  maximum.  We  use  the  expres¬ 
sion  >  k  to  denote  the  sequence  being  circularly  shifted  by  k 
positions  to  the  right.7  For  example,  for  k  =  2  this  is 

<&  =  (  0, 1,3,2, ...,5,3,7, 4), 

<4>2  =  (7, 4,0, 1,3,2,.  ..,5,3), 


and 


&max  =  argmax  Val(c^  >  fc),  (10-7) 

0<k<M 

denotes  the  shift  required  to  maximize  the  corresponding  arithmetic 
value.  The  resulting  code  sequence  or  shape  number , 

^  ^max)  (10.8) 

is  normalized  with  respect  to  the  starting  point  and  can  thus  be 
directly  compared  element  by  element  with  other  normalized  code 
sequences.  Since  the  function  Val()  in  Eqn.  (10.6)  produces  values 
that  are  in  general  too  large  to  be  actually  computed,  in  practice  the 
relation 

Val(ci')  >  Val(c'2') 
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7  That  is,  (c"  >  k){  i )  =  —  k)  mod  M). 


is  determined  by  comparing  the  lexicographic  ordering  between  the  4  Properties  of 
sequences  c![  and  so  that  the  arithmetic  values  need  not  be  com-  Binary  Regions 
puted  at  all. 

Unfortunately,  comparisons  based  on  chain  codes  are  generally 
not  very  useful  for  determining  the  similarity  between  regions  simply 
because  rotations  at  arbitrary  angles  90°)  have  too  great  of  an 
impact  (change)  on  a  region’s  code.  In  addition,  chain  codes  are 
not  capable  of  handling  changes  in  size  (scaling)  or  other  distortions. 

Section  10.4  presents  a  number  of  tools  that  are  more  appropriate  in 
these  types  of  cases. 

Fourier  shape  descriptors 

An  elegant  approach  to  describing  contours  are  so-called  Fourier 
shape  descriptors,  which  interpret  the  two-dimensional  contour  C  = 

(x0,  xll . . . ,  xM_i)  with  aq  =  (iq,  v)  as  a  sequence  of  values  in  the 
complex  plane,  where 


zi  =  (ui  H-  i  •  Vj)  G  C.  (10.9) 

From  this  sequence,  one  obtains  (using  a  suitable  method  of  interpo¬ 
lation  in  case  of  an  8-connected  contour),  a  discrete,  one-dimensional 
periodic  function  f(s)  G  C  with  a  constant  sampling  interval  over 
s,  the  path  length  around  the  contour.  The  coefficients  of  the 
ID  Fourier  spectrum  (see  Sec.  18.3)  of  this  function  f(s)  provide 
a  shape  description  of  the  contour  in  frequency  space,  where  the 
lower  spectral  coefficients  deliver  a  gross  description  of  the  shape. 
The  details  of  this  classical  method  can  be  found,  for  example, 
in  [88,97,126,128,222].  This  technique  is  described  in  considerable 
detail  in  Chapter  26. 


10.4  Properties  of  Binary  Regions 

Imagine  that  you  have  to  describe  the  contents  of  a  digital  image 
to  another  person  over  the  telephone.  One  possibility  would  be  to 
call  out  the  value  of  each  pixel  in  some  agreed  upon  order.  A  much 
simpler  way  of  course  would  be  to  describe  the  image  on  the  basis  of 
its  properties — for  example,  “a  red  rectangle  on  a  blue  background”, 
or  at  an  even  higher  level  such  as  “a  sunset  at  the  beach  with  two 
dogs  playing  in  the  sand”.  While  using  such  a  description  is  simple 
and  natural  for  us,  it  is  not  (yet)  possible  for  a  computer  to  generate 
these  types  of  descriptions  without  human  intervention.  For  comput¬ 
ers,  it  is  of  course  simpler  to  calculate  the  mathematical  properties 
of  an  image  or  region  and  to  use  these  as  the  basis  for  further  clas¬ 
sification.  Using  features  to  classify,  be  they  images  or  other  items, 
is  a  fundamental  part  of  the  field  of  pattern  recognition,  a  research 
area  with  many  applications  in  image  processing  and  computer  vi¬ 
sion  [64, 169, 228]. 


10.4.1  Shape  Features 

The  comparison  and  classification  of  binary  regions  is  widely  used,  for 
example,  in  optical  character  recognition  (OCR)  and  for  automating 
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10  Regions  in  Binary  processes  ranging  from  blood  cell  counting  to  quality  control  inspec- 

Images  tion  of  manufactured  products  on  assembly  lines.  The  analysis  of 
binary  regions  turns  out  to  be  one  of  the  simpler  tasks  for  which 
many  efficient  algorithms  have  been  developed  and  used  to  imple¬ 
ment  reliable  applications  that  are  in  use  every  day. 

By  a  feature  of  a  region,  we  mean  a  specific  numerical  or  quali¬ 
tative  measure  that  is  computable  from  the  values  and  coordinates 
of  the  pixels  that  make  up  the  region.  As  an  example,  one  of  the 
simplest  features  is  its  size  or  area ;  that  is  the  number  of  pixels  that 
make  up  a  region.  In  order  to  describe  a  region  in  a  compact  form, 
different  features  are  often  combined  into  a  feature  vector.  This  vec¬ 
tor  is  then  used  as  a  sort  of  “signature”  for  the  region  that  can  be  used 
for  classification  or  comparison  with  other  regions.  The  best  features 
are  those  that  are  simple  to  calculate  and  are  not  easily  influenced 
(robust)  by  irrelevant  changes,  particularly  translation,  rotation,  and 
scaling. 


10.4.2  Geometric  Features 


A  region  7Z  of  a  binary  image  can  be  interpreted  as  a  two-dimensional 
distribution  of  foreground  points  pi  =  {u^vf)  on  the  discrete  plane 
Z2,  that  is,  as  a  set 

P  •  •  •  5  ^N  —  l}  {(^0?  ^o)?  (^1?  M)?  •  •  •  5  (^TV— 1?  ^  N —  l)}* 

Most  geometric  properties  are  defined  in  such  a  way  that  a  region  is 
considered  to  be  a  set  of  pixels  that,  in  contrast  to  the  definition  in 
Sec.  10.1,  does  not  necessarily  have  to  be  connected. 


Perimeter 

The  perimeter  (or  circumference)  of  a  region  7Z  is  defined  as  the 
length  of  its  outer  contour,  where  7Z  must  be  connected.  As  illus¬ 
trated  in  Fig.  10.14,  the  type  of  neighborhood  relation  must  be  taken 
into  account  for  this  calculation.  When  using  a  4-neighborhood,  the 
measured  length  of  the  contour  (except  when  that  length  is  1)  will 
be  larger  than  its  actual  length. 

In  the  case  of  8-neighborhoods,  a  good  approximation  is  reached 
by  weighing  the  horizontal  and  vertical  segments  with  1  and  diag¬ 
onal  segments  with  y/2.  Given  an  8-connected  chain  code  c'n  = 
(c'0,  ci,...  c/A/_1),  the  perimeter  of  the  region  is  arrived  at  by 


with 


M—l 

Perimeter(77)  =  length(c'),  (10.10) 

2  =  0 


length  (c) 


1  for  c  =  0,  2, 4,  6, 
y/2  for  c=  1,3,  5,  7. 


(10.11) 


However,  with  this  conventional  method  of  calculation,  the  real 
perimeter  P{7Z)  is  systematically  overestimated.  As  a  simple  rem¬ 
edy,  an  empirical  correction  factor  of  0.95  works  satisfactorily  even 
for  relatively  small  regions,  that  is, 


P(  TV)  se  0.95  •  Peri  meter  (7Z). 
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(10.12) 


Area 

The  area  of  a  binary  region  7Z  can  be  found  by  simply  counting  the 
image  pixels  that  make  up  the  region,  that  is, 


10.4  Properties  of 
Binary  Regions 


A(K)  =  N=\K 


(10.13) 


The  area  of  a  connected  region  without  holes  can  also  be  approx¬ 
imated  from  its  closed  contour,  defined  by  M  coordinate  points 
(x0,cci,  •  •  •  x M—i) i  where  =  (rq,?^),  using  the  Gaussian  area  for¬ 
mula  for  polygons: 

M—l 

E(  ^ i  *  ^(z+1)  mod  M 
i=0 

When  the  contour  is  already  encoded  as  a  chain  code  c'n  =  (cq,  c^, . . . 
c'M_ i),  then  the  region’s  area  can  be  computed  (trivially)  with  Eqn. 
(10.14)  by  expanding  Cabs  into  a  sequence  of  contour  points  from 
an  arbitrary  starting  point  (e.g.,  (0,0)).  However,  the  area  can  also 
be  calculated  directly  from  the  chain  code  representation  without 
expanding  the  contour  [263]  (see  also  Exercise  10.12). 

While  simple  region  properties  such  as  area  and  perimeter  are  not 
influenced  (except  for  quantization  errors)  by  translation  and  rota¬ 
tion  of  the  region,  they  are  definitely  affected  by  changes  in  size;  for 
example,  when  the  object  to  which  the  region  corresponds  is  imaged 
from  different  distances.  However,  as  will  be  described,  it  is  possi¬ 
ble  to  specify  combined  features  that  are  invariant  to  translation, 
rotation,  and  scaling  as  well. 


u 


(z+l)  mod  M  *  ^i)  *  (10.14) 


Compactness  and  roundness 

Compactness  is  understood  as  the  relation  between  a  region’s  area 
and  its  perimeter.  We  can  use  the  fact  that  a  region’s  perimeter 
P  increases  linearly  with  the  enlargement  factor  while  the  area  A 
increases  quadratically  to  see  that,  for  a  particular  shape,  the  ratio 
A/P 2  should  be  the  same  at  any  scale.  This  ratio  can  thus  be  used 
as  a  feature  that  is  invariant  under  translation,  rotation,  and  scaling. 
When  applied  to  a  circular  region  of  any  diameter,  this  ratio  has  a 
value  of  so  by  normalizing  it  against  a  filled  circle,  we  create  a 
feature  that  is  sensitive  to  the  roundness  or  circularity  of  a  region, 


Circularity(7£)  =  47 r  • 


MJV 

p2{ny 


(10.15) 


which  results  in  a  maximum  value  of  1  for  a  perfectly  round  region 
7Z  and  a  value  in  the  range  [0, 1]  for  all  other  shapes  (Fig.  10.15).  If 
an  absolute  value  for  a  region’s  roundness  is  required,  the  corrected 
perimeter  estimate  (Eqn.  (10.12))  should  be  employed.  Figure  10.15 
shows  the  circularity  values  of  different  regions  as  computed  with  the 
formulation  in  Eqn.  (10.15). 


Bounding  box 

The  bounding  box  of  a  region  7Z  is  the  minimal  axis-parallel  rectangle 
that  encloses  all  points  of  7£, 
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Fig.  10.15 

Circularity  values  for  differ¬ 
ent  shapes.  Shown  are  the 
corresponding  estimates  for 
Circularity(77.)  as  defined  in 
Eqn.  (10.15).  Corrected  values 
calculated  with  Eqn.  (10.12) 
are  shown  in  parentheses. 

(a)  0.904  (b)  0.607  (c)  0.078 

(1.001)  (0.672)  (0.086) 


Fig.  10.16 

Example  bounding  box 
(a)  and  convex  hull  (b) 
of  a  binary  image  region. 


(a)  (b) 

BoundingBox(7^)  (^min5  ^niax?  ^min?  ^max) 5  (10.16) 

where  umin,umax  and  vm[n,vmax  are  the  minimal  and  maximal  co¬ 
ordinate  values  of  all  points  {u^Vi)  G  1Z  in  the  x  and  y  directions, 
respectively  (Fig.  10.16(a)). 

Convex  hull 

The  convex  hull  is  the  smallest  convex  polygon  that  contains  all 
points  of  the  region  1Z.  A  physical  analogy  is  a  board  in  which  nails 
stick  out  in  correspondence  to  each  of  the  points  in  the  region.  If 
you  were  to  place  an  elastic  band  around  all  the  nails,  then,  when 
you  release  it,  it  will  contract  into  a  convex  hull  around  the  nails  (see 
Figs.  10.16(b)  and  10.21(c)).  Given  N  contour  points,  the  convex 
hull  can  be  computed  in  time  0(N  \ogV),  where  V  is  the  number 
vertices  in  the  polygon  of  the  resulting  convex  hull  [17]. 

The  convex  hull  is  useful,  for  example,  for  determining  the  con¬ 
vexity  or  the  density  of  a  region.  The  convexity  is  defined  as  the 
relationship  between  the  length  of  the  convex  hull  and  the  original 
perimeter  of  the  region.  Density  is  then  defined  as  the  ratio  between 
the  area  of  the  region  and  the  area  of  its  convex  hull.  The  diameter , 
on  the  other  hand,  is  the  maximal  distance  between  any  two  nodes 
on  the  convex  hull. 


10.5  Statistical  Shape  Properties 

When  computing  statistical  shape  properties,  we  consider  a  region 
1Z  to  be  a  collection  of  coordinate  points  distributed  within  a  two- 
dimensional  space.  Since  statistical  properties  can  be  computed  for 
point  distributions  that  do  not  form  a  connected  region,  they  can 


be  applied  before  segmentation.  An  important  concept  in  this  con-  ^  5  Statistical  Shape 
text  are  the  central  moments  of  the  region’s  point  distribution,  which  Properties 
measure  characteristic  properties  with  respect  to  its  midpoint  or  cen¬ 
troid. 


10.5.1  Centroid 


The  centroid  or  center  of  gravity  of  a  connected  region  can  be  easily 
visualized.  Imagine  drawing  the  region  on  a  piece  of  cardboard  or 
tin  and  then  cutting  it  out  and  attempting  to  balance  it  on  the  tip  of 
your  finger.  The  location  on  the  region  where  you  must  place  your 
finger  in  order  for  the  region  to  balance  is  the  centroid  of  the  region.8 

The  centroid  x  =  (x,y)T  of  a  binary  (not  necessarily  connected) 
region  is  the  arithmetic  mean  of  the  pont  coordinates  xi  =  (iq,^), 
that  is, 


or 


x  = 


1 

n 


•E  x< 

xie'TZ 


(Ui,Vi)  (UziVi) 


(10.17) 


(10.18) 


10.5.2  Moments 

The  formulation  of  the  region’s  centroid  in  Eqn.  (10.18)  is  only  a 
special  case  of  the  more  general  statistical  concept  of  a  moment. 
Specifically,  the  expression 

mvq{lV)  =  /(u,  v)  •  up  •  vq  (10.19) 

(u,v)£7Z 


describes  the  (ordinary)  moment  of  order  p,  q  for  a  discrete  (image) 
function  I(u,v)  G  R;  for  example,  a  grayscale  image.  All  the  follow¬ 
ing  definitions  are  also  generally  applicable  to  regions  in  grayscale 
images.  The  moments  of  connected  binary  regions  can  also  be  calcu¬ 
lated  directly  from  the  coordinates  of  the  contour  points  [212,  p.  148]. 

In  the  special  case  of  a  binary  image  I(u,v)  G  {0,1},  only  the 
foreground  pixels  with  /(u,  v)  =  1  in  the  region  7Z  need  to  be  consid¬ 
ered,  and  therefore  Eqn.  (10.19)  can  be  simplified  to 


mpq( TV)  =  yy^^up-vq  .  (10.20) 

(u,v)£TZ 

In  this  way,  the  area  of  a  binary  region  can  be  expressed  as  its  zero- 
order  moment, 


A(Tl) 


E  1  =  E  u°'v°  =  mooW 

(u,v)  (u,v) 


(10.21) 


and  similarly  the  centroid  x  Eqn.  (10.18)  can  be  written  as 


8  Assuming  you  did  not  imagine  a  region  where  the  centroid  lies  outside 
of  the  region  or  within  a  hole  in  the  region,  which  is  of  course  possible. 
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These  moments  thus  represent  concrete  physical  properties  of  a  re¬ 
gion.  Specifically,  the  area  m00  is  in  practice  an  important  basis  for 
characterizing  regions,  and  the  centroid  (x,  y)  permits  the  reliable 
and  (within  a  fraction  of  a  pixel)  exact  specification  of  a  region’s 
position. 

10.5.3  Central  Moments 

To  compute  position-independent  (translation-invariant)  region  fea¬ 
tures,  the  region’s  centroid,  which  can  be  determined  precisely  in 
any  situation,  can  be  used  as  a  reference  point.  In  other  words,  we 
can  shift  the  origin  of  the  coordinate  system  to  the  region’s  centroid 
x  =  (x,  y)  to  obtain  the  central  moments  of  order  p,  q: 

fj-pqir)  =  E  V)  ■  (U~  %)P  ■  ( v  -  y)q •  (10.23) 

(u,v)en 


1 


x  = 


n 

i 


y 


n 


E 

(■ u,v ) 

E 

(u,v) 


mooCR.)’ 


o  i  ™01(7l) 

u  -v  — 


(10.22) 


m00(7Z)' 


For  a  binary  image  (with  I(u,v)  =  1  within  the  region  TV),  Eqn. 
(10.23)  can  be  simplified  to 

yPq(fc)  =  E(U  “  P  '  “  y)q-  (10.24) 

(u,v)£7Z 


10.5.4  Normalized  Central  Moments 


Central  moment  values  of  course  depend  on  the  absolute  size  of  the 
region  since  the  value  depends  directly  on  the  distance  of  all  region 
points  to  its  centroid.  So,  if  a  2D  shape  is  scaled  uniformly  by  some 
factor  sGl,  its  central  moments  multiply  by  the  factor 


s(p+q+ 2) 


(10.25) 


Thus  size-invariant  “normalized”  moments  are  obtained  by  scaling 
with  the  reciprocal  of  the  area  A  =  /x00  =  m00  raised  to  the  required 
power  in  the  form 


Hpq(K)  fa 


pq 


/  1  \  (p+q+2)/ 2 

Uoo(ft)  ^ 


for  (p  +  q)>2  [126,  p.  529]. 


(10.26) 


10.5.5  Java  Implementation 

Program  10.3  gives  a  direct  (brute  force)  Java  implementation  for 
computing  the  ordinary,  central,  and  normalized  central  moments 
for  binary  images  (BACKGROUND  =  0).  This  implementation  is  only 
meant  to  clarify  the  computation,  and  naturally  much  more  efficient 
implementations  are  possible  (see,  e.g.,  [131]). 
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1 

o 

//  Ordinary  moment: 

Z 

3 

double  moment (ImageProcessor  I,  int  p,  int 

q)  { 

4 

double  Mpq  =  0.0; 

5 

for  (int  v  =  0;  v  <  I . get Height () ;  v++)  { 

6 

for  (int  u  =  0;  u  <  I . getWidthO  ;  u++) 

{ 

7 

if  (I .  getPixel  (u,  v)  >  0)  { 

8 

Mpq+=  Math.pow(u,  p)  *  Math.pow(v, 

q); 

9 

} 

10 

} 

11 

} 

12 

return  Mpq; 

13 

} 

14 

15 

//  Central  moments: 

16 

17 

double  centralMoment (ImageProcessor  I,  int 

p,  int 

q) 

{ 

18 

double  mOO  =  moment  (I,  0,  0);  //  region  area 

19 

double  xCtr  =  moment (I,  1,  0)  /  mOO; 

20 

double  yCtr  =  moment (I,  0,  1)  /  mOO; 

21 

double  cMpq  =  0.0; 

22 

for  (int  v  =  0;  v  <  I . get Height () ;  v++)  { 

23 

for  (int  u  =  0;  u  <  I . getWidthO  ;  u++) 

{ 

24 

if  (I . getPixel (u,  v)  >  0)  { 

25 

cMpq+=  Math.pow(u-xCtr ,  p)  *  Math.pow(v-yCtr , 

q); 

26 

} 

27 

} 

28 

} 

29 

return  cMpq; 

30 

} 

31 

32 

//  Normalized  central  moments: 

33 

34 

double  nCentralMoment (ImageProcessor  I,  int 

p ,  int 

q) 

{ 

35 

double  mOO  =  moment (I,  0,  0); 

36 

double  norm  =  Math . pow(m00 ,  0.5  *  (p  +  q 

+  2)); 

37 

return  centralMoment (I ,  p,  q)  /  norm; 

38 

} 

10.6  Moment-Based 
Geometric  Properties 

Prog.  10.3 

Example  of  directly  computing 
moments  in  Java.  The  meth¬ 
ods  moment (),  centralMoment () , 
and  nCentralMoment  ()  com¬ 
pute  for  a  binary  image  the 
moments  mpq ,  /r  ,  and  ppq 
(Eqns.  (10.20),  (10.24),  and 
(10.26)). 


10.6  Moment-Based  Geometric  Properties 

While  normalized  moments  can  be  directly  applied  for  classifying 
regions,  further  interesting  and  geometrically  relevant  features  can 
be  elegantly  derived  from  statistical  region  moments. 

10.6.1  Orientation 

Orientation  describes  the  direction  of  the  major  axis,  that  is,  the 
axis  that  runs  through  the  centroid  and  along  the  widest  part  of  the 
region  (Fig.  10.18(a)).  Since  rotating  the  region  around  the  major 
axis  requires  less  effort  (smaller  moment  of  inertia)  than  spinning  it 
around  any  other  axis,  it  is  sometimes  referred  to  as  the  major  axis 
of  rotation.  As  an  example,  when  you  hold  a  pencil  between  your 
hands  and  twist  it  around  its  major  axis  (that  is,  around  the  lead), 
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Fig.  10.17 

Major  axis  of  a  region.  Ro¬ 
tating  an  elongated  region 
7 Z,  interpreted  as  a  physical 
body,  around  its  major  axis 
requires  less  effort  (least  mo¬ 
ment  of  inertia)  than  rotat¬ 
ing  it  around  any  other  axis. 


the  pencil  exhibits  the  least  mass  inertia  (Fig.  10.17).  As  long  as  a 
region  exhibits  an  orientation  at  all  (/i2o(^)  /i02(^'))?  the  direction 

Oji  of  the  major  axis  can  be  found  directly  from  the  central  moments 

jp q 


tan(2  0n)  = 


2  •  /in  (7^) 


/i2o(^-)  —  AteO^) 


and  thus  the  corresponding  angle  is 


0-jz  =  -  •  tan 


1 

2 

1 

2 


-i  /  2-/in(7^) 


^/i2o(^)  —  M02  (^) 


-  •  ArcTan(/x20(7£)  -  n02{TZ),2- fin{TZ)) 


(10.27) 


(10.28) 


(10.29) 


The  resulting  angle  6n  is  in  the  range 


7T  7 r 
2  ’  2 


.9  Orientation  mea¬ 


surements  based  on  region  moments  are  very  accurate  in  general. 


Calculating  orientation  vectors 

When  visualizing  region  properties,  a  frequent  task  is  to  plot  the 
region’s  orientation  as  a  line  or  arrow,  usually  anchored  at  the  center 
of  gravity  x  =  (x,  y)T;  for  example,  by  a  parametric  line  of  the  form 

*  =  *  +  A  •  *d  =  (*)  +  A  •  ,  (10-30) 


with  the  normalized  orientation  vector  xd  and  the  length  variable 
A  >  0.  To  find  the  unit  orientation  vector  xd  =  (cos  0,  sin  0)T,  we 
could  first  compute  the  inverse  tangent  to  get  2 6  (Eqn.  (10.28))  and 
then  compute  the  cosine  and  sine  of  0.  However,  the  vector  xd  can 
also  be  obtained  without  using  trigonometric  functions  as  follows. 
Rewriting  Eqn.  (10.27)  as 


tone**)  =  ,  =  £  = 

-  M02CV)  b  cos(2 9n) 


(10.31) 


we  get  (by  Pythagora’s  theorem) 

9  See  Sec.  A.l  in  the  Appendix  for  the  computation  of  angles  with  the 
ArcTan()  (inverse  tangent)  function  and  Sec.  F.1.6  for  the  corresponding 
Java  method  Math.  atan2() . 
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-y 


sin(2  9n) 


a 

\J  a2  +  b2 


and 


cos(26>^) 


b 

\/  a2  +  b2  ’ 


where  A  =  2/in(7 Z)  and  B  =  /i2o(^) 
cos2ct  =  ^[1  +  cos(2a)]  and  sin2ct  =  \ 
the  normalized  orientation  vector  xd  - 


—  y 02  (^)-  Using  the  relations 
[1  —  cos(2a)],  we  can  compute 

=  0d,2/d)T  as 


xd  =  cos(6»k)  = 


0 


1  •  (1  I  ,  b  ) 

2  V  vG2  +  &2  ' 


0 


yd  =  sin  (Qn)  =  < 


Hi 

Hi 


Vtt2  +  62 
b 

V  a2 -\-b2 


) 


1 

2 


1 

2 


for  a  =  b  =  0, 
otherwise, 

for  a  =  b  =  0, 
for  a  >  0, 

for  a  <  0, 


(10.32) 


(10.33) 


straight  from  the  central  region  moments  /in(7£),  M20  (^)?  and  Mo2(^)? 
as  defined  in  Eqn.  (10.31).  The  horizontal  component  (x^)  in  Eqn. 
(10.32)  is  always  positive,  while  the  case  switch  in  Eqn.  (10.33)  cor¬ 
rects  the  sign  of  the  vertical  component  (yd)  to  map  to  the  same 
angular  range  [—  §,+§]  as  Eqn.  (10.28).  The  resulting  vector  xd  is 
normalized  (i.e.,  \\(xd,yd)\\  =  1)  and  could  be  scaled  arbitrarily  for 
display  purposes  by  a  suitable  length  A,  for  example,  using  the  re¬ 
gion’s  eccentricity  value  described  in  Sec.  10.6.2  (see  also  Fig.  10.19). 


10.6.2  Eccentricity 

Similar  to  the  region  orientation,  moments  can  also  be  used  to  de¬ 
termine  the  “elongatedness”  or  eccentricity  of  a  region.  A  naive  ap¬ 
proach  for  computing  the  eccentricity  could  be  to  rotate  the  region 
until  we  can  fit  a  bounding  box  (or  enclosing  ellipse)  with  a  maximum 
aspect  ratio.  Of  course  this  process  would  be  computationally  inten¬ 
sive  simply  because  of  the  many  rotations  required.  If  we  know  the 
orientation  of  the  region  (Eqn.  (10.28)),  then  we  may  fit  a  bounding 
box  that  is  parallel  to  the  region’s  major  axis.  In  general,  the  propor¬ 
tions  of  the  region’s  bounding  box  is  not  a  good  eccentricity  measure 


10.6  Moment-Based 
Geometric  Properties 

Fig.  10.18 

Region  orientation  and  ec¬ 
centricity.  The  major  axis  of 
the  region  extends  through  its 
center  of  gravity  x  at  the  ori¬ 
entation  6.  Note  that  angles 
are  in  the  range  [—  and 

increment  in  the  clockwise  di¬ 
rection  because  the  y  axis  of 
the  image  coordinate  system 
points  downward  (in  this  ex¬ 
ample,  6  ~  —0.759  ph  —43.5°). 
The  eccentricity  of  the  region 
is  defined  as  the  ratio  between 
the  lengths  of  the  major  axis 
(ra)  and  the  minor  axis  ( rb )  of 
the  “equivalent”  ellipse. 
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Images  the  box. 

Based  on  region  moments,  highly  accurate  and  stable  measures 
can  be  obtained  without  any  iterative  search  or  optimization.  Also, 
moment-based  methods  do  not  require  knowledge  of  the  boundary 
length  (as  required  for  computing  the  circularity  feature  in  Sec. 
10.4.2),  and  they  can  also  handle  nonconnected  regions  or  point 
clouds.  Several  different  formulations  of  region  eccentricity  can  be 
found  in  the  literature  [15,  126, 128]  (see  also  Exercise  10.17).  We 
adopt  the  following  definition  because  of  its  simple  geometrical  inter¬ 
pretation: 


Ecc  (K) 


a1 


M20  +  M02  +  V (^20  ~  M 02 )2  +  4  •  M n 

M20  +  M02  —  V (^20  —  M 02)2  +  4  •  Mil 


(10.34) 


where  a1  =  2A1?  a2  =  2A2  are  proportional  to  the  eigenvalues  A1?  A2 
(with  X1  >  A2)  of  the  symmetric  2x2  matrix 


( M20  Mil  \ 
\Mn  M02 /  ’ 


(10.35) 


with  the  region’s  central  moments  /xn,  /x20,  /i 02  (see  Eqn.  (10. 23)). 10 
The  values  of  Ecc  are  in  the  range  [1,  00),  where  Ecc  =  1  corresponds 
to  a  circular  disk  and  elongated  regions  have  values  >  1. 

The  value  returned  by  Ecc(77.)  is  invariant  to  the  region’s  orien¬ 
tation  and  size,  that  is,  this  quantity  has  the  important  property  of 
being  rotation  and  scale  invariant.  However,  the  values  cq,  a2  contain 
relevant  information  about  the  spatial  structure  of  the  region.  Geo¬ 
metrically,  the  eigenvalues  AX,A2  (and  thus  a1?a2)  directly  relate  to 
the  proportions  of  the  “equivalent”  ellipse,  positioned  at  the  region’s 
center  of  gravity  (x,y)  and  oriented  at  0  =  9n  Eqn.  (10.28).  The 
lengths  of  the  major  and  minor  axes,  ra  and  rbl  are 


n 


2  a2 

~R 


1 

2 

5 


(10.36) 

(10.37) 


respectively,  with  a1?a2  as  defined  in  Eqn.  (10.34)  and  \7 Z\  being  the 
number  of  pixels  in  the  region.  Given  the  axes’  lengths  ra,  rb  and  the 
centroid  (x,y),  the  parametric  equation  of  this  ellipse  is 


fx\  f  COS (6)  -  sin (0)\  fra-  cos (t)\ 

\yj  +  \sin(9)  cos (9) )  \rb  ■  sin (t) ) 

fx  - j-  cos(0)  •  ra  •  cos (t)  —  sin (0)  •  rb  •  sin(t)\ 
yy  +  sin(0)  •  ra  •  cos (t)  +  cos(^)  •  rb  •  sin(t)  J  ’ 


(10.38) 

(10.39) 


for  0  <  t  <  27 r.  If  entirely  filled ,  the  region  described  by  this  el¬ 
lipse  would  have  the  same  central  moments  as  the  original  region 
1Z.  Figure  10.19  shows  a  set  of  regions  with  overlaid  orientation  and 
eccentricity  results. 
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A  is  actually  the  covariance  matrix  for  the  distribution  of  pixel  positions 
inside  the  region  (see  Sec.  D.2  in  the  Appendix). 


10.6  Moment-Based 
Geometric  Properties 


Fig.  10.19 

Orientation  and  eccentricity 
examples.  The  orientation  6 
(Eqn.  (10.28))  is  displayed 
for  each  connected  region  as 
a  vector  with  the  length  pro¬ 
portional  to  the  region’s  ec¬ 
centricity  value  Ecc(7Z)  (Eqn. 
(10.34)).  Also  shown  are  the 
ellipses  (Eqns.  (10.36)  and 
(10.37))  corresponding  to  the 
orientation  and  eccentricity 
parameters. 


10.6.3  Bounding  Box  Aligned  to  the  Major  Axis 

While  the  ordinary,  x/y  axis-aligned  bounding  box  (see  Sec.  10.4.2) 
is  of  little  practical  use  (because  it  is  sensitive  to  rotation),  it  may 
be  interesting  to  see  how  to  find  a  region’s  bounding  box  that  is 
aligned  with  its  major  axis,  as  defined  in  Sec.  10.6.1.  Given  a  region’s 
orientation  angle 


e 


a 


( cos(en)\ 
\sin  (0n) ) 


is  the  unit  vector  parallel  to  its  major  axis;  thus 


(10.40) 


(10.41) 


is  the  unit  vector  orthogonal  to  ea .n  The  bounding  box  can  now  be 
determined  as  follows  (see  Fig.  10.20): 

1.  Project  each  region  point12  ui  =  (rq,  Vj)  onto  the  vector  ea  (par¬ 
allel  to  the  region’s  major  axis)  by  calculating  the  dot  product13 


ai  =  ui-  ea  (10.42) 

and  keeping  the  minimum  and  maximum  values 

amin  =  min  ai’>  amax  =  max  Gq.  (10.43) 

2.  Analogously,  project  each  region  point  ui  onto  the  orthogonal 
axis  (specified  by  the  vector  eb )  by 

11  x1-  =  perp(x)  =  (  _i  o )  '  x- 

12  Of  course,  if  the  region’s  contour  is  available,  it  is  sufficient  to  iterate 
over  the  contour  points  only. 

13  See  Sec.  B.3.1,  Eqn.  (B.19)  in  the  Appendix. 
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Fig.  10.20 

Calculation  of  a  region’s  ma¬ 
jor  axis-aligned  bounding  box. 
The  unit  vector  ea  is  paral¬ 
lel  to  the  region’s  major  axis 
(oriented  at  angle  0);  eb  is 
perpendicular  to  ea.  The  pro¬ 
jection  of  a  region  point  ui 
onto  the  lines  defined  by  ea 
and  eb  yields  the  lengths  ai 
and  bi,  respectively  (measured 
from  the  coordinate  origin). 
The  resulting  quantities  amin, 

®maxi  ^min)  ^max  define  the 
corner  points  ( A ,  B ,  C,  D )  of 
the  axis-aligned  bounding  box. 
Note  that  the  position  of  the 
region’s  centroid  (cc)  is  not 
required  in  this  calculation. 


and  keeping  the  minimum  and  maximum  values,  that  is, 


hnin  =  min  bmax  =  max  (10.45) 

Note  that  steps  1  and  2  can  be  performed  in  a  single  iteration 
over  all  region  points. 

3.  Finally,  from  the  resulting  quantities  amin,  amax,  bm in,  5max,  cal¬ 
culate  the  four  corner  points  A,  B,C,D  of  the  bounding  box  as 


A  =  a 
C  =  a 


min  *  "F  Cmin  *  i 

•  p  4 -  h  • 
max  '-'a  1  "max  '-'bi 


B 

D 


a 


mm  a 


en  +  b 


max  *  ^ b  i 


a 


max  w a 


G. n  ~\~  ^min  • 


(10.46) 


The  complete  calculation  is  summarized  in  Alg.  10.20;  a  typical  ex¬ 
ample  is  shown  in  Fig.  10.21(d). 


Alg.  10.5 

Calculation  of  the  major 
axis-aligned  bounding  box 
for  a  binary  region  7Z.  If 
the  region’s  contour  is  avail¬ 
able,  it  is  sufficient  to  use 
the  contour  points  only. 


1: 

Major  AxisAlignedBoundingBox(7^) 

Input:  7 Z  =  { Ui },  a  binary  region  containing  points  iq  Gl2. 
Returns  the  four  corner  points  of  the  region’s  bounding  box. 

2 

9  «<—  0.5  •  ArcTan(/i2o(^) _ 

_Mo2(^),  2  •  MnC^))  t>  see  Eq.  10.28 

3 

ea  <—  (cos(0),  sin(0))T  D>  unit  vector  parall.  to  region’s  major  axis 

4 

eb  <—  (sin(0),  —  cos(0))T 

>  unit  vector  perpendic.  to  major  axis 

5 

^min  ^  OO,  Cmax  ^ 

6 

^min  ^  OO,  5max  i  OO 

7 

for  all  u  £  1Z  do 

8 

a  e —  U  •  Ga 

>  project  u  onto  ea  (Eq.  10.42) 

9 

^min  ^  min((2min,  ®) 

10 

®max  ^  lHSx(flmax)  ®) 

11 

b  <—  u  •  eb 

>  project  u  onto  eh  (Eq.  10.44) 

12 

^min  ^  min(6min?  b) 

13 

^max  ^  ^^(^max)  ^) 

14 

A  f  ^min  ‘  Ca  H“  5min  •  eb 

15 

B  ^  ^min  "  ^ a  "F  5max  *  Gb 

16 

B  4  flmax  *  Ca  +  5max  •  Gb 

17 

B  f  ^max  ‘  T  5min  •  Gb 

18 

return  (A,  B ,  C,  D ) 

>  corners  of  the  bounding  box 
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10.6  Moment-Based 
Geometric  Properties 


Fig.  10.21 

Geometric  region  properties. 
Original  binary  image  (a), 
centroid  and  orientation  vec¬ 
tor  (length  determined  by  the 
region’s  eccentricity)  of  the 
major  axis  (b),  convex  hull  (c), 
and  major  axis-aligned  bound¬ 
ing  box  (d). 


10.6.4  Invariant  Region  Moments 

Normalized  central  moments  are  not  affected  by  the  translation  or 
uniform  scaling  of  a  region  (i.e.,  the  values  are  invariant),  but  in 
general  rotating  the  image  will  change  these  values. 

Hu’s  invariant  moments 

A  classical  solution  to  this  problem  is  a  clever  combination  of  simpler 
features  known  as  “Hu’s  Moments”  [112]: 14 

=  ft20  +  fto2  >  (10.47) 

02  =  feo  —  ft 02)2  +  4  Jl\i , 

03  =  (^30  —  3/f12)2  +  (3  /^21  —  M03)2? 

04  =  {fi>30  +  Ml2)2  +  {ft>2l  +  fto3)2i 

05  =  (/03O  —  3/^12)  •  {ft 30  +  M12)  ‘  [{ft 30  +  Ml2)2  —  3(/X2i  +  M03)2]  + 

(3/^21  —  M03)  ‘  fel  +  ftos)  '  [3  (/%)  +  Ml2)2  —  {ft*21  +  M03)2]  5 

06  =  {ft20  ~  ft02)  '  [{ft 30  +  Ml2)2  —  {ft21  +  M03)2]  + 

4  Mil  •  (/03O  +  M12)  '  {ft21  +  ^03)7 

07  =  (3  ^21  ~~  M03)  ‘  (/03O  +  M12)  '  [{ft 30  +  Ml2)2  —  3  {ft>21  +  M03)2]  + 

(3/^12  —  /^3o)  ‘  fel  +  M03)  ‘  [3  (/^30  +  Ml2)2  —  {ft*21  +  M03)2]  * 


14 


In  order  to  improve  the  legibility  of  Eqn.  (10.47)  the  argument  for  the 
region  (77)  has  been  dropped;  as  an  example,  with  the  region  argument, 
the  first  line  would  read  H1(JZ)  =  h2o(77)  +  Mo2(77)5  and  so  on. 
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10  Regions  in  Binary  practice,  the  logarithm  of  these  quantities  (that  is,  log(^))  is 

Images  used  since  the  raw  values  may  have  a  very  large  range.  These  fea¬ 
tures  are  also  known  as  moment  invariants  since  they  are  invariant 
under  translation,  rotation,  and  scaling.  While  defined  here  for  bi¬ 
nary  images,  they  are  also  applicable  to  parts  of  grayscale  images; 
examples  can  be  found  in  [88,  p.  517]. 

Flusser’s  invariant  moments 

It  was  shown  in  [72,73]  that  Hu’s  moments,  as  listed  in  Eqn.  (10.47), 
are  partially  redundant  and  incomplete.  Based  on  so-called  complex 
moments  cpq  E  C,  Flusser  designed  an  improved  set  of  11  rotation 
and  scale-invariant  features  •  •  •  >  Vhi  (see  Eqn.  (10.51))  for  char¬ 
acterizing  2D  shapes.  For  grayscale  images  (with  I(u,v)  G  R),  the 
complex  moments  of  order  p,  q  are  defined  as 

CpqW  =  E  v)  ■  G  +  ['y)p  ■  G  -  (io.48) 

(u,v)£7Z 

with  centered  positions  x  =  u  —  x  and  y  =  v  —  y,  and  (x,  y)  being  the 
centroid  of  TZ  (i  denotes  the  imaginary  unit).  In  the  case  of  binary 
images  (with  I(u,v)  E  [0, 1])  Eqn.  (10.48)  simplifies  to 

cPq0V  =  EG  +  i -y)p  ■  G  -  i -y)q-  (10.49) 

(u,v)eiz 

Analogous  to  Eqn.  (10.26),  the  complex  moments  can  be  scale- 
normalized  to 

—  ^(p+g+2)/2  *  cvw  (10.50) 

with  A  being  the  area  of  7Z  [74,  p.  29].  Finally,  the  derived  rotation 


and 

scale  invariant  region  : 

moments  of  2nd  to  4th 

l  order  are 

15 

Vh 

=  R-e(ci.i)- 

> 

G 

=  Re(c2,i 

'  G,2)> 

G 

=  Re(c2,o 

•  c?,2) 

G 

1 — 1 

3^ 

to 

o 

cU 

G 

=  Re(c3>0 

'  Cl,2), 

G 

=  Im(c3)0 

*  £1,2) 

G 

—  Ru(c2j2): 

) 

G 

=  Re(c3?1 

■cip, 

G 

=  Im(c3>i 

^2  \ 
‘  ^1,2) 

Go 

II 

CD 

o 

ch), 

G  i 

i — i 

y 

o 

■ctp. 

(10. 

Table  10.1  lists  the  normalized  Flusser  moments  for  five  binary  shapes 
taken  from  the  Kimia  dataset  [134]. 

Shape  matching  with  region  moments 

One  obvious  use  of  invariant  region  moments  is  shape  matching  and 
classification.  Given  two  binary  shapes  A  and  B,  with  associated 
moment  (“feature”)  vectors 

fA  =  (A  (A).-- -All  (A)  and  fB  = 

respectively,  one  approach  could  be  to  simply  measure  the  difference 
between  shapes  by  the  Euclidean  distance  of  these  vectors  in  the  form 
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15 


In  Eqn.  (10.51),  the  use  of  Re( )  for  the  quantities  i/q,  ^2,  Vk  (which  are 
real- valued  per  se)  is  redundant. 


bl 

^2 

b3 

"05 

b6 

VV 

b8 

b9 

blO 

bll 


0.3730017575 

0.0012699373 

0.0004041515 

0.0000097827 

0.0000012672 

0.0000001090 

0.2687922057 

0.0003192443 

0.0000053208 

0.0000103461 

0.0000000120 


0.2545476083 

0.0004247053 

0.0000644829 

-0.0000076547 

0.0000002327 

-0.0000000483 

0.1289708408 

0.0000414818 

-0.0000032541 

0.0000000091 

-0.0000000020 


0.2154034257 

0.0002068089 

0.0000274491 

0.0000071688 

0.0000000637 

0.0000000041 

0.0814034374 

0.0000134036 

0.0000030880 

0.0000000019 

0.0000000008 


0.2124041195 

0.0001089652 

0.0000014248 

-0.0000022103 

0.0000000083 

0.0000000153 

0.0712567626 

0.0000003020 

-0.0000008365 

-0.0000000003 

-0.0000000000 


0.3600613700 

0.0017187073 

-0.0003853999 

-0.0001944121 

-0.0000078073 

-0.0000061997 

0.2340886626 

-0.0002878997 

-0.0001628669 

0.0000001922 

0.0000003015 


10.6  Moment-Based 
Geometric  Properties 


Table  10.1 

Binary  shapes  and  associated 
normalized  Flusser  moments 
riJj1 ,  .  .  .  ,  'ipn .  Notice  the  magni¬ 
tude  of  the  moments  varies  by 
a  large  factor. 


0.000 

0.183 

0.245 

0.255 

0.037 

0.183 

0.000 

0.062 

0.071 

0.149 

0.245 

0.062 

0.000 

0.011 

0.210 

0.255 

0.071 

0.011 

0.000 

0.220 

0.037 

0.149 

0.210 

0.220 

0.000 

Table  10.2 

Inter-class  (Euclidean)  dis¬ 
tances  dE(A,  B )  between  nor¬ 
malized  shape  feature  vectors 
for  the  five  reference  shapes 
(see  Eqn.  (10.52)).  Off-diagonal 
values  should  be  consistently 
large  to  allow  good  shape  dis¬ 
crimination. 


dE(A,  B)  =  \\fA  -  fB 


11 

i— 1 


1/2 


(10.52) 


Concrete  distances  between  the  five  sample  shapes  are  listed  in  Table 
10.2.  Since  the  moment  vectors  are  rotation  and  scale  invariant,16 
shape  comparisons  should  remain  unaffected  by  such  transforma¬ 
tions.  Note,  however,  that  the  magnitude  of  the  individual  moments 
varies  over  a  very  large  range.  Thus,  if  the  Euclidean  distance  is 
used  as  we  have  just  suggested,  the  comparison  (matching)  of  shapes 
is  typically  dominated  by  a  few  moments  (or  even  a  single  moment) 
of  relatively  large  magnitude,  while  the  small- valued  moments  play 
virtually  no  role  in  the  distance  calculation.  This  is  because  the  Eu¬ 
clidean  distance  treats  the  multi-dimensional  feature  space  uniformly 
along  all  dimensions. 

As  a  consequence,  moment-based  shape  discrimination  with  the 
ordinary  Euclidean  distance  is  typically  not  very  selective.  A  simple 
solution  is  to  replace  Eqn.  (10.52)  by  a  weighted  distance  measure  of 
the  form 


11 


df  (A,  B)  =  Wi  ■  li’i(A)  -  ipi(B) 


1/2 


i— 1 


(10.53) 


with  fixed  weights  . . . ,  wxl  >  0  assigned  to  each  each  moment 
feature  to  compensate  for  the  differences  in  magnitude. 

A  more  elegant  approach  is  to  use  of  the  Mahalanobis  distance 
[24, 157]  for  comparing  the  moment  vectors,  which  accounts  for  the 
statistical  distribution  of  each  vector  component  and  avoids  large- 
magnitude  components  dominating  the  smaller  ones.  In  this  case, 

1  z? 

Although  the  invariance  property  holds  perfectly  for  continuous  shapes, 
rotating  and  scaling  discrete  binary  images  may  significantly  affect  the 
associated  region  moments. 
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10  Regions  in  Binary  the  distance  calculation  becomes 

lMAGES  d  u(A,B)=[(fA-fB)T-'E-1-(fA-fB)]1/\  (10.54) 

where  E  is  the  11x11  covariance  matrix  for  the  moment  vectors  /. 
Note  that  the  expression  under  the  root  in  Eqn.  (10.54)  is  the  dot 
product  of  a  row  vector  and  a  column  vector,  that  is,  the  result  is  a 
non-negative  scalar  value.  The  Mahalanobis  distance  can  be  viewed 
as  a  special  form  of  the  weighted  Euclidean  distance  (Eqn.  (10.53)), 
where  the  weights  are  determined  by  the  variability  of  the  individual 
vector  components.  See  Sec.  D.3  in  the  Appendix  and  Exercise  10.16 
for  additional  details. 


10.7  Projections 

Image  projections  are  ID  representations  of  the  image  contents,  usu¬ 
ally  calculated  parallel  to  the  coordinate  axis.  In  this  case,  the  hori¬ 
zontal  and  vertical  projections  of  a  scalar- valued  image  /(r,  v)  of  size 
M  x  N  are  defined  as 

M- 1 

^hor(^)  E  I(u,  v)  for  0  <  v  <  TV, 

a=0 

N- 1 

Pver  (u)  =  7(r,  v)  for  0  <  u  <  M . 

v=0 

The  horizontal  projection  Phor(^o)  (Eqn.  (10.55))  is  the  sum  of  the 
pixel  values  in  the  image  row  v0  and  has  length  N  corresponding  to 
the  height  of  the  image.  On  the  other  hand,  a  vertical  projection  Pver 
of  length  M  is  the  sum  of  all  the  values  in  the  image  column  u0  (Eqn. 
(10.56)).  In  the  case  of  a  binary  image  with  /(r,  v)  E  0,1,  the  projec¬ 
tion  contains  the  count  of  the  foreground  pixels  in  the  corresponding 
image  row  or  column. 

Program  10.4  gives  a  direct  implementation  of  the  projection  cal¬ 
culations  as  the  run()  method  for  an  Image J  plugin,  where  projec¬ 
tions  in  both  directions  are  computed  during  a  single  traversal  of  the 
image. 

Projections  in  the  direction  of  the  coordinate  axis  are  often  uti¬ 
lized  to  quickly  analyze  the  structure  of  an  image  and  isolate  its  com¬ 
ponent  parts;  for  example,  in  document  images  it  is  used  to  separate 
graphic  elements  from  text  blocks  as  well  as  to  isolate  individual  lines 
(see  the  example  in  Fig.  10.22).  In  practice,  especially  to  account  for 
document  skew,  projections  are  often  computed  along  the  major  axis 
of  an  image  region  Eqn.  (10.28).  When  the  projection  vectors  of  a 
region  are  computed  in  reference  to  the  centroid  of  the  region  along 
the  major  axis,  the  result  is  a  rotation-invariant  vector  description 
(often  referred  to  as  a  “signature”)  of  the  region. 


(10.55) 

(10.56) 
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10.8  Topological  Region  Properties 

Topological  features  do  not  describe  the  shape  of  a  region  in  continu¬ 
ous  terms;  instead,  they  capture  its  structural  properties.  Topological 


1 

public  void  run(ImageProcessor  I) 

{ 

2 

int  M  =  I  .getWidthO  ; 

3 

int  N  =  I . get Height () ; 

4 

int  []  pHor  =  new  int  [N]  ;  //  =  Phor 

(0 

5 

int  []  pVer  =  new  int  [M]  ;  //  =  Pver 

(u) 

6 

for  (int  v  =  0;  v  <  N;  v++)  { 

7 

for  (int  u  =  0;  u  <  M;  u++)  { 

8 

int  p  =  I . getPixel (u,  v) ; 

9 

pHor  [v]  +=  p ; 

10 

pVer[u]  +=  p; 

11 

} 

12 

}  //  use  projections  pHor,  pVer  now 

13 

//  ... 

14 

} 

10.8  Topological 
Region  Properties 

Prog.  10.4 

Calculation  of  horizontal  and 
vertical  projections.  The  run() 
method  for  an  ImageJ  plugin 
(ip  is  of  type  ByteProcessor 
or  ShortProcessor)  computes 
the  projections  in  x  and  y  di¬ 
rections  simultaneously  in  a  a 
single  traversal  of  the  image. 
The  projections  are  repre¬ 
sented  by  the  one-dimensional 
arrays  horProj  and  verProj 
with  elements  of  type  int. 


$  si!  IE  ft 


0) 


Fig.  10.22 

Horizontal  and  vertical  projec¬ 
tions  of  a  binary  image. 


properties  are  typically  invariant  even  under  strong  image  transfor¬ 
mations.  The  convexity  of  a  region,  which  can  be  calculated  from 
the  convex  hull  (Sec.  10.4.2),  is  also  a  topological  property. 

A  simple  and  robust  topological  feature  is  the  number  of  holes 
Nl(K)  hr  a  region.  This  feature  is  easily  determined  while  finding 
the  inner  contours  of  a  region,  as  described  in  Sec.  10.2.2. 

A  useful  topological  feature  that  can  be  derived  directly  from  the 
number  of  holes  is  the  so-called  Euler  number  NE ,  which  is  the  dif¬ 
ference  between  the  number  of  connected  regions  NR  and  the  number 
of  their  holes  NL ,  that  is, 

Ne(R)  =  Nr(R)  -  Nl(7Z).  (10.57) 

In  the  case  of  a  single  connected  region  this  is  simply  1  —  NL.  For  a 
picture  of  the  number  “8”,  for  example,  NE  =  1  —  2  =  —  1  and  for 
the  letter  “D”  we  get  NE  =  1  —  1  =  0. 

Topological  features  are  often  used  in  combination  with  numeri¬ 
cal  features  for  classification.  A  classic  example  of  this  combination 
is  OCR  (optical  character  recognition)  [38].  Figure  10.23  shows  an 
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10  Regions  in  Binary 

Images 


Fig.  10.23 

Visual  identification  mark¬ 
ers  composed  of  recur¬ 
sively  nested  regions  [22]. 


interesting  use  of  topological  structures  for  coding  optical  markers 
used  in  augmented  reality  applications  [22]. 17  The  recursive  nesting 
of  outer  and  inner  regions  is  equivalent  to  a  tree  structure  that  allows 
fast  and  unique  identification  of  a  larger  number  of  known  patterns 
(see  also  Exercise  10.21). 


10.9  Java  Implementation 

Most  algorithms  described  in  this  chapter  are  implemented  as  part  of 
the  imagingbook  library.18  The  key  classes  are  BinaryRegion  and 
Contour,  the  abstract  class  RegionLabeling  and  its  concrete  sub¬ 
classes  RecursiveLabeling,  BreadthFirstLabeling,  DepthFirst- 
Labeling  (Alg.  10.1)  and  Sequent ialLabeling  (Alg.  10.2).  The 
combined  region  labeling  and  contour  tracing  method  (Algs.  10.3  and 
10.4)  is  implemented  by  class  RegionContourLabeling.  Additional 
details  can  be  found  in  the  online  documentation. 

Example 

A  complete  example  for  the  use  of  this  API  is  shown  in  Prog.  10.5. 
Particularly  useful  is  the  facility  for  visiting  all  positions  of  a  specific 
region  using  the  iterator  returned  by  method  getRegionPoints () , 
as  demonstrated  by  this  code  segment: 

RegionLabeling  segmenter  =  .... 

//  Get  the  largest  region: 

BinaryRegion  r  =  segmenter . getRegions (true) . get (0) ; 

//  Loop  over  all  points  of  region  r: 
for  (Point  p  :  r . getRegionPoints () )  { 
int  u  =  p.x; 
int  v  =  p.y; 

//  do  something  with  position  u,  v 

} 


10.10  Exercises 

Exercise  10.1.  Manually  simulate  the  execution  of  both  variations 
( depth- first  and  breadth- first )  of  the  flood- fill  algorithm  using  the 
image  in  Fig.  10.24  and  starting  at  position  (5, 1). 


17  http://reactivision.sourceforge.net/. 

18  Package  imagingbook. pub. regions. 
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10.10  Exercises 


1  ... 

2  import  imagingbook . pub . regions . BinaryRegion ; 

3  import  imagingbook. pub. regions .Contour; 

4  import  imagingbook . pub . regions . ContourOverlay ; 

5  import  imagingbook . pub . regions . RegionContourLabeling ; 

6  import  java.awt .geom. Point 2D; 

7  import  java. util . List ; 

8 

9  public  class  Region_Contours_Demo  implements  PluglnFilter  { 

10 

11  public  int  setup (String  arg,  ImagePlus  im)  { 

12  return  D0ES_8G  +  N0_CHANGES ; 

13  } 

14 

15  public  void  run(ImageProcessor  ip)  { 

16  //  Make  sure  we  have  a  proper  byte  image: 

17  ByteProcessor  bp  =  ip . convertToByteProcessor () ; 

18 

19  //  Create  the  region  labeler  /  contour  tracer: 

20  RegionContourLabeling  segmenter  = 

21  new  RegionContourLabeling (bp) ; 

22 

23  //  Get  the  list  of  detected  regions  (sort  by  size): 

24  List<BinaryRegion>  regions  = 

25  segmenter . getRegions (true) ; 

26  if  (regions . isEmpty () )  { 

27  I J. error ("No  regions  detected!"); 

28  return; 

29  } 

30 

31  //  List  all  regions: 

32  IJ .  logC'Detected  regions:  "  +  regions  .  size  ()) ; 

33  for  (BinaryRegion  r:  regions)  { 

34  IJ.log(r.toStringO)  ; 

35  } 

36 

37  //  Get  the  outer  contour  of  the  largest  region: 

38  BinaryRegion  largestRegion  =  regions . get (0) ; 

39  Contour  oc  =  largestRegion . getOuterContour () ; 

40  IJ . logC'Points  on  outer  contour  of  largest  region:"); 

41  Point2D[]  points  =  oc . getPointArray  ()  ; 

42  for  (int  i  =  0;  i  <  points . length;  i++)  { 

43  Point2D  p  =  points [i] ; 

44  IJ .  log(  "Point  "  +  i  +  ":  "  +  p.toStringO )  ; 

45  } 

46 

47  //  Get  all  inner  contours  of  the  largest  region: 

48  List<Contour>  ics  =  largestRegion . get InnerCont ours () ; 

49  IJ .  logC'Inner  regions  (holes):  "  +  ics.  size  ()); 

50  } 

51  } 


Prog.  10.5 

Complete  example  for 
the  use  of  the  regions 
API.  The  ImageJ  plugin 
Region_Contours_Demo  seg¬ 
ments  the  binary  (8-bit 
grayscale)  image  ip  into  con¬ 
nected  components.  This  is 
done  with  an  instance  of  class 
RegionContourLabeling  (see 
line  21),  which  also  extracts 
the  regions’  contours.  In  line 
25,  a  list  of  regions  (sorted  by 
size)  is  produced  which  is  sub¬ 
sequently  traversed  (line  33). 
The  treatment  of  outer  and 
inner  contours  as  well  as  the 
iteration  over  individual  con¬ 
tour  points  is  shown  in  lines 
38-49. 
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Fig.  10.24 

Binary  image  for  Exercise  10.1. 
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Exercise  10.2.  The  implementation  of  the  flood-fill  algorithm  in 
Prog.  10.1  places  all  the  neighboring  pixels  of  each  visited  pixel  into 
either  the  stack  or  the  queue  without  ensuring  they  are  foreground 
pixels  and  that  they  he  within  the  image  boundaries.  The  number 
of  items  in  the  stack  or  the  queue  can  be  reduced  by  ignoring  (not 
inserting)  those  neighboring  pixels  that  do  not  meet  the  two  condi¬ 
tions  given.  Modify  the  depth-first  and  breadth-first  variants  given  in 
Prog.  10.1  accordingly  and  compare  the  new  running  times. 

Exercise  10.3.  The  implementations  of  depth- first  and  breadth- first 
labeling  shown  in  Prog.  10.1  will  run  significantly  slower  than  the 
recursive  version  because  the  frequent  creation  of  new  Point  objects 
is  quite  time  consuming.  Modify  the  depth-first  version  of  Prog.  10.1 
to  use  a  stack  with  elements  of  a  primitive  type  (e.g.,  int)  instead. 
Note  that  (at  least  in  Java)19  it  is  not  possible  to  specify  a  built-in 
list  structure  (such  as  Deque  or  LinkedList)  for  a  primitive  element 
type.  Implement  you  own  stack  class  that  internally  uses  an  int- 
array  to  store  the  ( u ,  v )  coordinates.  What  is  the  maximum  number 
of  stack  entries  needed  for  a  given  image  of  size  MxN ?  Compare  the 
performance  of  your  solution  to  the  original  version  in  Prog.  10.1. 

Exercise  10.4.  Implement  an  ImageJ  plugin  that  encodes  a  given 
binary  image  by  run  length  encoding  (Sec.  10.3.2)  and  stores  it  in  a 
file.  Develop  a  second  plugin  that  reads  the  file  and  reconstructs  the 
image. 

Exercise  10.5.  Calculate  the  amount  of  memory  required  to  rep¬ 
resent  a  contour  with  1000  points  in  the  following  ways:  (a)  as  a 
sequence  of  coordinate  points  stored  as  pairs  of  int  values;  (b)  as  an 
8-chain  code  using  Java  byte  elements,  and  (c)  as  an  8-chain  code 
using  only  3  bits  per  element. 

Exercise  10.6.  Implement  a  Java  class  for  describing  a  binary  image 
region  using  chain  codes.  It  is  up  to  you,  whether  you  want  to  use 
an  absolute  or  differential  chain  code.  The  implementation  should  be 
able  to  encode  closed  contours  as  chain  codes  and  also  reconstruct 
the  contours  given  a  chain  code. 

Exercise  10.7.  The  Graham  Scan  method  [91]  is  an  efficient  algo¬ 
rithm  for  calculating  the  convex  hull  of  a  2D  point  set  (of  size  n), 
with  time  complexity  0(n  •  log(n)).20  Implement  this  algorithm  and 
show  that  it  is  sufficient  to  consider  only  the  outer  contour  points  of 
a  region  to  calculate  its  convex  hull. 

19  Other  languages  like  C#  allow  this. 

20  See  also  http://en.wikipedia.org/wiki/Graham_scan. 
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Exercise  10.8.  While  computing  the  convex  hull  of  a  region,  the 
maximal  diameter  (maximum  distance  between  two  arbitrary  points) 
can  also  be  simply  found.  Devise  an  alternative  method  for  comput¬ 
ing  this  feature  without  using  the  convex  hull.  Determine  the  running 
time  of  your  algorithm  in  terms  of  the  number  of  points  in  the  region. 

Exercise  10.9.  Implement  an  algorithm  for  comparing  contours  us¬ 
ing  their  shape  numbers  Eqn.  (10.6).  For  this  purpose,  develop  a  met¬ 
ric  for  measuring  the  distance  between  two  normalized  chain  codes. 
Describe  if,  and  under  which  conditions,  the  results  will  be  reliable. 

Exercise  10.10.  Sketch  the  contour  equivalent  to  the  absolute  chain 
code  sequence  c'n  =  (6,  7,  7, 1,  2,  0,  2,  3,  5, 4, 4).  (a)  Choose  an  arbi¬ 
trary  starting  point  and  determine  if  the  resulting  contour  is  closed, 
(b)  Find  the  associated  differential  chain  code  c'f  (Eqn.  (10.5)). 

Exercise  10.11.  Calculate  (under  assumed  8-neighborhood)  the  sha¬ 
pe  number  of  base  6  =  8  (see  Eqn.  (10.6))  for  the  differential  chain 
code  c"  =  (1,  0,  2, 1,  6,  2, 1,  2,  7,  0,  2)  and  all  possible  circular  shifts  of 
this  code.  Which  shift  yields  the  maximum  arithmetic  value? 

Exercise  10.12.  Using  Eqn.  (10.14)  as  the  basis,  develop  and  im¬ 
plement  an  algorithm  that  computes  the  area  of  a  region  from  its 
8-chain-encoded  contour  (see  also  [263],  [127,  Sec.  19.5]). 

Exercise  10.13.  Modify  Alg.  10.3  such  that  the  outer  and  inner  con¬ 
tours  are  not  returned  as  individual  lists  (Cout,  Cin)  but  as  a  compos¬ 
ite  tree  structure.  An  outer  contour  thus  represents  a  region  that 
may  contain  zero,  one,  or  more  inner  contours  (i.e.,  holes).  Each 
inner  contour  may  again  contain  other  regions  (i.e.,  outer  contours), 
and  so  on. 

Exercise  10.14.  Sketch  an  example  binary  region  where  the  cen¬ 
troid  does  not  he  inside  the  region  itself. 

Exercise  10.15.  Implement  the  binary  region  moment  features  pro¬ 
posed  by  Hu  (Eqn.  (10.47))  and/or  Flusser  (Eqn.  (10.51))  and  verify 
that  they  are  invariant  under  image  scaling  and  rotation.  Use  the 
test  image  in  Fig.  10. 2521  (or  create  your  own),  which  contains  ro¬ 
tated  and  mirrored  instances  of  the  reference  shapes,  in  addition  to 
other  (unknown)  shapes. 

Exercise  10.16.  Implement  the  Mahalanobis  distance  calculation, 
as  defined  in  Eqn.  (10.54),  for  measuring  the  similarity  between  shape 
moment  vectors. 

A.  Compute  the  covariance  matrix  E  (see  Sec.  D.3  in  the  Appendix) 
for  the  m  =  11  Flusser  shape  features  •  •  •  j^n  of  the  refer¬ 
ence  images  in  Table  10.1.  Calculate  and  tabulate  the  inter-class 
Mahalanobis  distances  for  the  reference  shapes,  analogous  to  the 
example  in  Table  10.2. 


21 


Images  are  available  on  the  book’s  website. 
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Fig.  10.25 

Test  image  for  moment-based 
shape  matching.  Reference 
shapes  (top)  and  test  image 
(bottom)  composed  of  rotated 
and/or  scaled  shapes  from 
the  Kimia  database  and  ad¬ 
ditional  (unclassified)  shapes. 


B.  Extend  your  analysis  to  a  larger  set  of  500-1000  shapes  (e.g., 
from  the  Kimia  dataset  [134],  which  contains  more  than  20  000 
binary  shape  images).  Calculate  the  normalized  moment  features 
and  the  covariance  matrix  X)  for  the  entire  image  set.  Calculate 
the  inter-class  distance  matrices  for  (a)  the  Euclidean  and  (b)  the 
Mahalanobis  distance.  Display  the  distance  matrices  as  grayscale 
images  (FloatProcessor)  and  interpret  them. 


Exercise  10.17.  There  are  alternative  definitions  for  the  eccentricity 
of  a  region  Eqn.  (10.34);  for  example  [128,  p.  394], 


Ecc  (1 Z)  =  ~  ^Q2 C^)] 2  +  4 -Mu (7^) 

[M2o(ft)+Mo2  m2 


(10.58) 


Implement  this  version  as  well  as  the  one  in  Eqn.  (10.34)  and  contrast 
the  results  using  suitably  designed  regions.  Determine  the  numeric 
range  of  these  quantities  and  test  if  they  are  really  rotation  and  scale- 
invariant. 


Exercise  10.18.  Write  an  Image J  plugin  that  (a)  finds  (labels)  all 
regions  in  a  binary  image,  (b)  computes  the  orientation  and  eccen¬ 
tricity  for  each  region,  and  (c)  shows  the  results  as  a  direction  vector 
and  the  equivalent  ellipse  on  top  of  each  region  (as  exemplified  in 
Fig.  10.19).  Hint:  Use  Eqn.  (10.39)  to  develop  a  method  for  drawing 
ellipses  at  arbitrary  orientations  (not  available  in  Image  J). 

Exercise  10.19.  The  Java  method  in  Prog.  10.4  computes  an  im¬ 
age’s  horizontal  and  vertical  projections.  The  scheme  described  in 
Sec.  10.6.3  and  illustrated  in  Fig.  10.20  can  be  used  to  calculate  pro¬ 
jections  along  arbitrary  directions  6.  Develop  and  implement  such  a 
process  and  display  the  resulting  projections. 
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Exercise  10.20.  Text  recognition  (OCR)  methods  are  likely  to  fail 
if  the  document  image  is  not  perfectly  axis-aligned.  One  method  for 
estimating  the  skew  angle  of  a  text  document  is  to  perform  binary 
segmentation  and  connected  components  analysis  (see  Fig.  10.26): 

•  Smear  the  original  binary  image  by  applying  a  disk-shaped 
morphological  dilation  with  a  specified  radius  (see  Chapter  9, 
Sec.  9.2.3).  The  aim  is  to  close  the  gaps  between  neighboring 
glyphs  without  closing  the  space  between  adjacent  text  lines  (Fig. 
10.26(b)) 

•  Apply  region  segmentation  to  the  resulting  image  and  calculate 
the  orientation  6(H)  and  the  eccentricity  E(H)  of  each  region  7Z 
(see  Secs.  10.6.1  and  10.6.2).  Ignore  all  regions  that  are  either 
too  small  or  not  sufficiently  elongated. 

•  Estimate  the  global  skew  angle  by  averaging  the  regions’  orien¬ 
tations  6}.  Note  that,  since  angles  are  circular ,  they  cannot  be 
averaged  in  the  usual  way  (see  Chapter  15,  Eqn.  (15.14)  for  how 
to  calculate  the  mean  of  a  circular  quantity).  Consider  using 
the  eccentricity  as  a  weight  for  the  contribution  of  the  associated 
region  to  the  global  average. 

•  Obviously,  this  scheme  is  sensitive  to  outliers ,  that  is,  against 
angles  that  deviate  strongly  from  the  average  orientation.  Try  to 
improve  this  estimate  (i.e.,  make  it  more  robust  and  accurate)  by 
iteratively  removing  angles  that  are  “too  far”  from  the  average 
orientation  and  then  recalculating  the  result. 

Exercise  10.21.  Draw  the  tree  structure,  defined  by  the  recursive 
nesting  of  outer  and  inner  regions,  for  each  of  the  markers  shown  in 
Fig.  10.23.  Based  on  this  graph  structure,  suggest  an  algorithm  for 
matching  pairs  of  markers  or,  alternatively,  for  retrieving  the  best¬ 
matching  marker  from  a  database  of  markers. 
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Fig.  10.26 

Document  skew  estimation 
example  (see  Exercise  10.20). 
Original  binary  image  (a);  re¬ 
sult  of  applying  a  disk-shaped 
morphological  dilation  with 
radius  3.0  (b);  region  orien¬ 
tation  vectors  (c);  histogram 
of  the  orientation  angle  6  (d). 
The  real  skew  angle  in  this 
scan  is  approximately  1.1°. 
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peaceful  purples.  Reagan 
wanted  to  jjrt  these  treaties 
ratified  first,  and  that  meant 
maliiniS  Sure  the  agreements 
CMlid  net  be  cheated  on  by 
-secret  tests.  As  Reagan  like  to 
say  “Trysi,  bul  verify,” 

In  1990,  after  Reagan  fuitl 
left  office,  both  the  Threshold 
Test  Ban  and  the  Peaceful 
Nuclear  Explosions  Tneal  y 
were  ratified  by  she  Semite 
after  sal  is  factory  review  of  she 
verification  provisions.  Rea 
nan's  first  requirement  on  the 
road  to  a  nuclear  lest  ban  was 
complete, 

Reagan's  second  require¬ 
ment  for  ending  nuctear  tesb 
tng  was  thai  Che  Soviets  and  the 
Americans  should  reduce  their 
nuclear  Stockpiles.  Thai  effort 
Storied  with  the  1937  Interme¬ 
diate 'Range  Xu  clear  Forces 
Treaty,  which  eliminated  medi¬ 
um  and  short-range  nuclear 
missiles,  The  Strategic  Arms 
Reduction  Tatfes  (START)  trea- 
lies  subsequently  continued 
U  S.  and  Russian  reductions, 
although  thousands  still  re¬ 
main. 

In  I99fi  the  Comprehensive 
Nuclear  Test  Ban  lYeaty  was 
crafted  to  ban  all  nuclear  test 


As  President  Eisenhower 
once  said,  nuclear  weapons  are 
(he only  thing  1  hat  can  destroy 
the  United  States.  Americans 
want  to  hear  liow  ihe  next 
president  plans  to  control  the 
thousands  of  these  ‘weapons  of 
mass  destruction  that  exist  In 
the  world. 

ll 's  worth  remembering  that 
in  October  L9Sfr  President  Ron¬ 
ald  Reagan  was  meeting  with 
Soviet  President  Mikhail  Gor¬ 
bachev  m  Reykjavik,  Iceland, 
to  discuss  eliminating  nuclear 
weapons. 

The  [Wo  Leade  rs  focused  on 
nuclear  weapons  testing.  If  you 
are  senous  about  total  nuclear 
disarmament,  you  have  to  end 
testing  firs!.  As  Reagan  wrote 
then,  “]  am  committed  to  the 
ultimate  attainment  of  a  total 
ba  n  on  nuclear  testing,  a  goal 
t  hat  has  bet n  endorsed  by 
every  U  .S.  president  since 
President  Eisenhower. " 

But  Reagan  had  woit  prere¬ 
quisites  111  I9S6  the  United 
Sta  ten  Senate  had  yet  to  ra|  i  fy 
two  treaties  than  had  been 
negotiated  with  Ihe  Soviets:  tho 
Threshold  Test  Bra,  which 
limited  the  ELieof  underground 
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Automatic  Thresholding 


Although  techniques  based  on  binary  image  regions  have  been  used 
for  a  very  long  time,  they  still  play  a  major  role  in  many  practical 
image  processing  applications  today  because  of  their  simplicity  and 
efficiency.  To  obtain  a  binary  image,  the  first  and  perhaps  most 
critical  step  is  to  convert  the  initial  grayscale  (or  color)  image  to  a 
binary  image,  in  most  cases  by  performing  some  form  of  thresholding 
operation,  as  described  in  Chapter  4,  Sec.  4.1.4. 

Anyone  who  has  ever  tried  to  convert  a  scanned  document  image 
to  a  readable  binary  image  has  experienced  how  sensitively  the  result 
depends  on  the  proper  choice  of  the  threshold  value.  This  chapter 
deals  with  finding  the  best  threshold  automatically  only  from  the  in¬ 
formation  contained  in  the  image,  i.e.,  in  an  “unsupervised”  fashion. 
This  may  be  a  single,  “global”  threshold  that  is  applied  to  the  whole 
image  or  different  thresholds  for  different  parts  of  the  image.  In  the 
latter  case  we  talk  about  “adaptive”  thresholding,  which  is  partic¬ 
ularly  useful  when  the  image  exhibits  a  varying  background  due  to 
uneven  lighting,  exposure,  or  viewing  conditions. 

Automatic  thresholding  is  a  traditional  and  still  very  active  area 
of  research  that  had  its  peak  in  the  1980s  and  1990s.  Numerous 
techniques  have  been  developed  for  this  task,  ranging  from  sim¬ 
ple  ad-hoc  solutions  to  complex  algorithms  with  firm  theoretical 
foundations,  as  documented  in  several  reviews  and  evaluation  stud¬ 
ies  [86,178,204,213,231].  Binarization  of  images  is  also  considered  a 
“segmentation”  technique  and  thus  often  categorized  under  this  term. 
In  the  following,  we  describe  some  representative  and  popular  tech¬ 
niques  in  greater  detail,  starting  in  Sec.  11.1  with  global  thresholding 
methods  and  continuing  with  adaptive  methods  in  Sec.  11.2. 


11.1  Global  Histogram-Based  Thresholding 

Given  a  grayscale  image  /,  the  task  is  to  find  a  single  “optimal” 
threshold  value  for  binarizing  this  image.  Applying  a  particular 
threshold  q  is  equivalent  to  classifying  each  pixel  as  being  either  part 

©  Spring er-Verlag  London  2016 

W.  Burger,  M.J.  Burge,  Digital  Image  Processing,  Texts  in  Computer  Science, 

DOI  10.1007/978-1-4471-6684-9  11 


253 


11  Automatic 
Thresholding 


Fig.  11.1 

Test  images  used  for  sub¬ 
sequent  thresholding  ex¬ 
periments.  Detail  from  a 
manuscript  by  Johannes 
Kepler  (a),  document  with 
fingerprint  (b),  ARToolkit 
marker  (c),  synthetic  two- 
level  Gaussian  mixture  im¬ 
age  (d).  Results  of  threshold¬ 
ing  with  the  fixed  threshold 
value  q  =  128  (e— h).  His¬ 
tograms  of  the  original  im¬ 
ages  (i— 1)  with  intensity  values 
from  0  (left)  to  255  (right). 
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of  the  background  or  the  foreground.  Thus  the  set  of  all  image  pix¬ 
els  is  partitioned  into  two  disjoint  sets  C0  and  C1,  where  C0  contains 
all  elements  with  values  in  [0,1 and  C1  collects  the  remaining 
elements  with  values  in  [</  +  l, . . . ,  K  —  1],  that  is, 


(u,v)  G 


if  I(u,v)  <  q  (background), 
if  I(u,v)  >  q  (foreground). 


(11.1) 


Of  course,  the  meaning  of  background  and  foreground  may  differ  from 
one  application  to  another.  For  example,  the  aforementioned  scheme 
is  quite  natural  for  astronomical  or  thermal  images,  where  the  rele¬ 
vant  “foreground”  pixels  are  bright  and  the  background  is  dark.  Con¬ 
versely,  in  document  analysis,  for  example,  the  objects  of  interest  are 
usually  the  dark  letters  or  artwork  printed  on  a  bright  background. 
This  should  not  be  confusing  and  of  course  one  can  always  invert  the 
image  to  adapt  to  this  scheme,  so  there  is  no  loss  of  generality  here. 

Figure  11.1  shows  several  test  images  used  in  this  chapter  and 
the  result  of  thresholding  with  a  fixed  threshold  value.  The  synthetic 
image  in  Fig.  11.1(d)  is  the  mixture  of  two  Gaussian  random  distri¬ 
butions  A/q  ,  Mi  for  the  background  and  foreground,  respectively,  with 
fiQ  =  80,  fix  —  170,  <r0  =  (J\  =20.  The  corresponding  histograms  of 
the  test  images  are  shown  in  Fig.  ll.l(i-l).  Note  that  all  histograms 
are  normalized  to  constant  area  (not  to  maximum  values,  as  usual), 
with  intensity  values  ranging  from  0  (left)  to  255  (right). 

The  key  question  is  how  to  find  a  suitable  (or  even  “optimal”) 
threshold  value  for  binarizing  the  image.  As  the  name  implies, 
histogram-based  methods  calculate  the  threshold  primarily  from  the 
information  contained  in  the  image’s  histogram,  without  inspecting 
the  actual  image  pixels.  Other  methods  process  individual  pixels 
for  finding  the  threshold  and  there  are  also  hybrid  methods  that  rely 
both  on  the  histogram  and  the  local  image  content.  Histogram-based 


techniques  are  usually  simple  and  efficient,  because  they  operate  on 
a  small  set  of  data  (256  values  in  case  of  an  8-bit  histogram);  they 
can  be  grouped  into  two  main  categories:  shape-based  and  statistical 
methods. 

Shape-based  methods  analyze  the  structure  of  the  histogram’s  dis¬ 
tribution,  for  example  by  trying  to  locate  peaks,  valleys  and  other 
“shape”  features.  Usually  the  histogram  is  first  smoothed  to  elimi¬ 
nate  narrow  peaks  and  gaps.  While  shape-based  methods  were  quite 
popular  early  on,  they  are  usually  not  as  robust  as  their  statistical 
counterparts  or  at  least  do  not  seem  to  offer  any  distinct  advantages. 
A  classic  representative  of  this  category  is  the  “triangle”  (or  “chord”) 
algorithm  described  in  [261].  References  to  numerous  other  shape- 
based  methods  can  be  found  in  [213]. 

Statistical  methods,  as  their  name  suggests,  rely  on  statistical  in¬ 
formation  derived  from  the  image’s  histogram  (which  of  course  is  a 
statistic  itself),  such  as  the  mean,  variance,  or  entropy.  In  the  next 
section,  we  discuss  a  few  elementary  parameters  that  can  be  obtained 
from  the  histogram,  followed  by  a  description  of  concrete  algorithms 
that  use  this  information.  Again  there  are  a  vast  number  of  similar 
methods  and  we  have  selected  four  representative  algorithms  to  be  de¬ 
scribed  in  more  detail:  (a)  iterative  threshold  selection  by  Ridler  and 
Calvard  [198],  (b)  Otsu’s  clustering  method  [ITT],  (c)  the  minimum 
error  method  by  Kittler  and  Illingworth  [116],  and  (d)  the  maximum 
entropy  thresholding  method  by  Kapur,  Sahoo,  and  Wong  [133]. 

11.1.1  Image  Statistics  from  the  Histogram 

As  described  in  Chapter  3,  Sec.  3.T,  several  statistical  quantities, 
such  as  the  arithmetic  mean,  variance  and  median,  can  be  calculated 
directly  from  the  histogram,  without  reverting  to  the  original  image 
data.  If  we  threshold  the  image  at  level  q  (0  <  q  <  K),  the  set  of 
pixels  is  partitioned  into  the  disjoint  subsets  C0,  C1,  corresponding  to 
the  background  and  the  foreground.  The  number  of  pixels  assigned 
to  each  subset  is 


Q 


K-l 


n0(q)  =  |C0|  =  and  D  =  101  =  Y  hD’  (1L2) 

g= o  g=q+ 1 

respectively.  Also,  because  all  pixels  are  assigned  to  either  the  back¬ 
ground  set  C0  or  the  foreground  set  C1? 


n0 ( q )  +  q  ( q )  —  Cq  H-  \Ci\  —  | Cq  U  —  M N . 


(11.3) 


For  any  threshold  q,  the  mean  values  of  the  associated  partitions 
Co,  Ci  can  be  calculated  from  the  image  histogram  as 


Mo(<2)  = 


Mi(v) 


1 


n0(q ) 

1 


n-i  (q)  ^  , 

iV  '  g=q+ 1 


T  5-  h(fl), 

3=0 

K-l 

9  ' 


(11.4) 


(11.5) 
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and  these  quantities  relate  to  the  image’s  overall  mean  fij  (Eqn.  (3.9)) 
by1 


1 


llT  — 


MN 


[%(<?)  •  Mo(<?)  +  ni(q)  ■  M<?)]  =  7*0 (K-l).  (11.6) 


Analogously,  the  variances  of  the  background  and  foreground  parti¬ 
tions  can  be  extracted  from  the  histogram  as2 


0o(<?)  =  •  X  ~  Ms))2  •  h(fl) 


°i (<?)  = 


l 


3=0 
K- 1 


(11.7) 


(q) 


X  ^  _  miL))2  •  %) 


g=q+ 1 


(Of  course,  as  in  Eqn.  (3.12),  this  calculation  can  also  be  performed  in 
a  single  iteration  and  without  knowing  /i0(g),  hr  advance.)  The 
overall  variance  aj  for  the  whole  image  is  identical  to  the  variance  of 
the  background  for  q  =  K  —  1 , 


=  iky  •  XT  -^)2 '  =  ^(^-i),  (H-8) 

3=0 


that  is,  for  all  pixels  being  assigned  to  the  background  partition.  Note 
that,  unlike  the  simple  relation  of  the  means  given  in  Eqn.  (11.6), 


1 

~MN 


KM)  •  (?o(q)  +  ni(q)  ■ 


(11.9) 


in  general  (see  also  Eqn.  (11.20)). 

We  will  use  these  basic  relations  in  the  discussion  of  histogram- 
based  threshold  selection  algorithms  in  the  following  and  add  more 
specific  ones  as  we  go  along. 


11.1.2  Simple  Threshold  Selection 

Clearly,  the  choice  of  the  threshold  value  should  not  be  fixed  but 
somehow  based  on  the  content  of  the  image.  In  the  simplest  case,  we 
could  use  the  mean  of  all  image  pixels, 

q  mean(7)  =  /r7,  (11.10) 


as  the  threshold  value  <7,  or  the  median ,  (see  Sec.  3.7.2), 


q  median(J)  =  raj, 


(H.H) 


or,  alternatively,  the  average  of  the  minimum  and  the  maximum 
(mid-range  value),  that  is, 


q  V- 


max(7)  +  min  (7) 
2 


(11.12) 


1  Note  that  /io(q),  AT  (q)  are  meant  to  be  functions  over  q  and  thus  /i0(K—l) 
in  Eqn.  (11.6)  denotes  the  mean  of  partition  C0  for  the  threshold  K  —  l. 

2  ctq  (q)  and  a2(q)  in  Eqn.  (11.7)  are  also  functions  over  q. 
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1:  QuantileThreshold(h,p) 

Input:  h  :  [0,  K— 1]  i— >•  N,  a  grayscale  histogram,  p,  the  proportion 
of  expected  background  pixels  (0  <  p  <  1).  Returns  the  optimal 
threshold  value  or  —1  if  no  threshold  is  found. 

2:  K  A-  Size(h)  >  number  of  intensity  levels 

K-l 

3:  MfV  ^  h(i)  >  number  of  image  pixels 

i= o 

4:  i  A-  0 

5:  c^h(0) 

6:  while  (i  <  K)  A  (c  <  MN  •  p)  do  >  quantile  calc.  (Eq.  11.13) 

7:  i  A-  i  +  1 

8:  c  A-  c  +  h(j) 

9:  if  c  <  MN  then  >  foreground  is  non-empty 

10:  q  A-  i 

11:  else  >  foreground  is  empty,  all  pixels  are  background 

12:  qi - 1 

13:  return  q 


Like  the  image  mean  fiT  (see  Eqn.  (3.9)),  all  these  quantities  can  be 
obtained  directly  from  the  histogram  h. 

Thresholding  at  the  median  segments  the  image  into  approxi¬ 
mately  equal-sized  background  and  foreground  sets,  that  is,  |C0|  ~ 

| C]_|,  which  assumes  that  the  “interesting”  (foreground)  pixels  cover 
about  half  of  the  image.  This  may  be  appropriate  for  certain  images, 
but  completely  wrong  for  others.  For  example,  a  scanned  text  image 
will  typically  contain  a  lot  more  white  than  black  pixels,  so  using  the 
median  threshold  would  probably  be  unsatisfactory  in  this  case.  If 
the  approximate  fraction  p  (0  <  p  <  1)  of  expected  background  pix¬ 
els  is  known  in  advance,  the  threshold  could  be  set  to  that  quantile 
instead.  In  this  case,  q  is  simply  chosen  as 

i 

q  A-  minji  i£«  i)  >  M-N-p  },  (11.13) 

3= o 

where  N  is  the  total  number  of  pixels.  We  see  that  the  median  is 
only  a  special  case  of  a  quantile  measure,  with  p  =  0.5.  This  simple 
thresholding  method  is  summarized  in  Alg.  11.1. 

For  the  mid-range  technique  (Eqn.  (11.12)),  the  limiting  intensity 
values  min(7)  and  max(7)  can  be  found  by  searching  for  the  smallest 
and  largest  non-zero  entries,  respectively,  in  the  histogram  h.  The 
mid-range  threshold  segments  the  image  at  50  %  (or  any  other  per¬ 
centile)  of  the  contrast  range.  In  this  case,  nothing  can  be  said  in 
general  about  the  relative  sizes  of  the  resulting  background  and  fore¬ 
ground  partitions.  Because  a  single  extreme  pixel  value  (outlier)  may 
change  the  contrast  range  dramatically,  this  approach  is  not  very  ro¬ 
bust.  Here  too  it  is  advantageous  to  define  the  contrast  range  by 
specifying  pixel  quantiles ,  analogous  to  the  calculation  of  the  quan¬ 
tities  a[ow  and  a'high  in  the  modified  auto-contrast  function  (see  Ch. 
4,  Sec.  4.4). 

In  the  pathological  (but  nevertheless  possible)  case  that  all  pixels 
in  the  image  have  the  same  intensity  g,  all  the  aforementioned  meth- 
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Alg.  11.1 

Quantile  thresholding.  The 
optimal  threshold  value  q  G 
[0,AT  —  2]  is  returned,  or  —1  if 
no  valid  threshold  was  found. 
Note  the  test  in  line  9  to  check 
if  the  foreground  is  empty  or 
not  (the  background  is  always 
non-empty  by  definition). 
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Fig.  11.2 

Results  from  various  simple 
thresholding  schemes.  Mean 
(a— d),  median  (e— h),  and  mid¬ 
range  (i— 1)  threshold,  as  spec¬ 
ified  in  Eqns.  (11.10)— (11.12). 
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(a)  Arithmetic  mean 
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q  =  158 
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q  =  84 


q  =  81 


(b)  Median 


q  =  179 


q  =  161 


q  =  165 


(c)  Mid-range 
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q  =  115 


q  =  128 


q  =  128 


q  =  120 


ods  will  return  the  threshold  q  =  g,  which  assigns  all  pixels  to  the 
background  partition  and  leaves  the  foreground  empty.  Algorithms 
should  try  to  detect  this  situation,  because  thresholding  a  uniform 
image  obviously  makes  no  sense.  Results  obtained  with  these  simple 
thresholding  techniques  are  shown  in  Fig.  11.2.  Despite  the  obvi¬ 
ous  limitations,  even  a  simple  automatic  threshold  selection  (such  as 
the  quantile  technique  in  Alg.  11.1)  will  typically  yield  more  reliable 
results  than  the  use  of  a  fixed  threshold. 

11.1.3  Iterative  Threshold  Selection  (Isodata  Algorithm) 

This  classic  iterative  algorithm  for  finding  an  optimal  threshold  is 
attributed  to  Ridler  and  Calvard  [198]  and  was  related  to  Isodata 
clustering  by  Velasco  [242].  It  is  thus  sometimes  referred  to  as  the 
“isodata”  or  “intermeans”  algorithm.  Like  in  many  other  global 
thresholding  schemes  it  is  assumed  that  the  image’s  histogram  is 
a  mixture  of  two  separate  distributions,  one  for  the  intensities  of  the 
background  pixels  and  the  other  for  the  foreground  pixels.  In  this 
case,  the  two  distributions  are  assumed  to  be  Gaussian  with  approx¬ 
imately  identical  spreads  (variances). 

The  algorithm  starts  by  making  an  initial  guess  for  the  threshold, 
for  example,  by  taking  the  mean  or  the  median  of  the  whole  image. 
This  splits  the  set  of  pixels  into  a  background  and  a  foreground  set, 
both  of  which  should  be  non-empty.  Next,  the  means  of  both  sets  are 
calculated  and  the  threshold  is  repositioned  to  their  average,  that  is, 
centered  between  the  two  means.  The  means  are  then  re-calculated 
for  the  resulting  background  and  foreground  sets,  and  so  on,  until 


1:  IsodataThreshold(h) 

Input:  h  :  [0,  K  —  1]  i— >  N,  a  grayscale  histogram. 

Returns  the  optimal  threshold  value  or  —1  if  no  threshold  is 
found. 


2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 


K  Size(h)  >  number  of  intensity  levels 

q  <—  Mean(h,0,  K  —  1)  >  set  initial  threshold  to  overall  mean 


repeat 

n0  <—  Count(h,  0,  q) 

n1  <—  Count(h,  g+1,  K  —  1) 

if  (n0  =  0)  V  (n1  =  0)  then 
return  —1 

/jl0  <—  Mean(h,  0,  q) 

Hi  4—  Mean(h,  g+1,  K—  1) 

q'  <-  q 

Mo  +  Ml 

1  L  2 

until  q  —  q 
return  q 


>  background  population 

>  foreground  population 

>  backgrd.  or  foregrd.  is  empty 

>  background  mean 
>  foreground  mean 

>  keep  previous  threshold 

>  calculate  the  new  threshold 

>  terminate  if  no  change 


b 

15:  Count(h,a,5)  :=  h(g) 

9  —  a 


16:  Mean(h,  a ,  b)  := 


■  b 

E9 

9 —  a 


h(s) 


r  b 


/  E  h0) 


-a 


the  threshold  does  not  change  any  longer.  In  practice,  it  takes  only 
a  few  iterations  for  the  threshold  to  converge. 

This  procedure  is  summarized  in  Alg.  11.2.  The  initial  threshold 
is  set  to  the  overall  mean  (line  3).  For  each  threshold  <7,  separate 
mean  values  Mo  5  Mi  are  computed  for  the  corresponding  foreground 
and  background  partitions.  The  threshold  is  repeatedly  set  to  the 
average  of  the  two  means  until  no  more  change  occurs.  The  clause 
in  line  7  tests  if  either  the  background  or  the  foreground  partition  is 
empty,  which  will  happen,  for  example,  if  the  image  contains  only  a 
single  intensity  value.  In  this  case,  no  valid  threshold  exists  and  the 
procedure  returns  —1.  The  functions  Count(h,a,6)  and  Mean(h,  a,  6) 
in  lines  15-16  return  the  number  of  pixels  and  the  mean,  respectively, 
of  the  image  pixels  with  intensity  values  in  the  range  [a,  b\.  Both  can 
be  computed  directly  from  the  histogram  h  without  inspecting  the 
image  itself. 

The  performance  of  this  algorithm  can  be  easily  improved  by  us¬ 
ing  tables  ^0(^)5  Ah((/)  f°r  the  background  and  foreground  means,  re¬ 
spectively.  The  modified,  table-based  version  of  the  iterative  thresh¬ 
old  selection  procedure  is  shown  in  Alg.  11.3.  It  requires  two  passes 
over  the  histogram  to  initialize  the  tables  /li0,  /jl1  and  only  a  small, 
constant  number  of  computations  for  each  iteration  in  its  main  loop. 
Note  that  the  image’s  overall  mean  /r7,  used  as  the  initial  guess  for 
the  threshold  q  (Alg.  11.3,  line  4),  need  not  be  calculated  separately 
but  can  be  obtained  as  fii  —  o(^— 1),  given  that  threshold  q  =  K—  1 

assigns  all  image  pixels  to  the  background.  The  time  complexity  of 
this  algorithm  is  thus  O(K),  that  is,  linear  w.r.t.  the  size  of  the 
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Alg.  11.2 

“Isodata”  threshold  selection 
based  on  the  iterative  method 
by  Ridler  and  Calvard  [198]. 
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Alg.  11.3 

Fast  version  of  “isodata” 
threshold  selection.  Pre¬ 
calculated  tables  are  used  for 
the  foreground  and  background 
means  /x0  and  /li^ ,  respectively. 


Fig.  11.3 

Thresholding  with  the  iso¬ 
data  algorithm.  Binarized 
images  and  the  corresponding 
optimal  threshold  values  (q). 
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1:  FastlsodataThreshold(h) 

Input:  h  :  [0,K  —  1\  oN,  a  grayscale  histogram. 

Returns  the  optimal  threshold  value  or  —1  if  no  threshold  is 
found. 


2: 

3: 

4: 

5: 

6: 

7: 

8: 

9: 

10: 

11: 


K  <—  Size(h)  >  number  of  intensity  levels 

(/z0,  fi1,  N)  <—  MakeMeanTables(h) 

q  lfi0(K  —  1)\  >  take  the  overall  mean  as  initial  threshold 


repeat 

if  (m o(9)  <  0)  V  (/ii  (g)  <  0)  then 

return  —1  >  background  or  foreground  is  empty 

q'  <—  q  D>  keep  previous  threshold 


M0(q)  +  Mi(q) 

2 


>  calculate  the  new  threshold 


until  q  —  q  >  terminate  if  no  change 

return  q 


12: 

13: 

14: 

15: 

16: 

17: 

18: 

19: 

20: 

21: 

22: 

23: 

24: 

25: 

26: 

27: 


MakeMeanTables(h) 

K  <—  Size(h) 

Create  maps  /r0,/i  1  :  [0,K  —  1]  R 

TL q  i —  0,  Sq  i —  0 

for  q  4—  0,  •  • ' ,  K  —  1  do  >  tabulate  background  means  /i0  (q) 
n0  n0  +  h(g) 
so  so  +  Q  ‘  h(g) 


Mo  (9)  <- 


s0/n0  if  n0  >  0 
—  1  otherwise 


N  4—  nQ 


Ti\  i —  0,  i —  0 

fii(K-l)  <—  0 
for  q  <—  K  —  2,  *  •  • ,  0  do 

ni  <r~  7li  +  h(g+l) 

Si  4 —  Si  +  {q~\~  1)  •  h(<7+l) 

s1/n1  if  n1  >  0 
—  1  otherwise 


>  tabulate  foreground  means  fii(q) 


Mi  L)  <- 


return  (fi0 ,  fi1 ,  N) 


(a)  q  =  128  (b)  q  =  125  (c)  q  —  94  (d)  q  —  90 


histogram.  Figure  11.3  shows  the  results  of  thresholding  with  the 
isodata  algorithm  applied  to  the  test  images  in  Fig.  11.1. 

11.1.4  Otsu’s  Method 

The  method  proposed  by  Otsu  [147, 177]  also  assumes  that  the  orig¬ 
inal  image  contains  pixels  from  two  classes,  whose  intensity  distri¬ 
butions  are  unknown.  The  goal  is  to  find  a  threshold  q  such  that 
the  resulting  background  and  foreground  distributions  are  maximally 
separated,  which  means  that  they  are  (a)  each  as  narrow  as  possi- 


ble  (have  minimal  variances)  and  (b)  their  centers  (means)  are  most 
distant  from  each  other. 

For  a  given  threshold  <7,  the  variances  of  the  corresponding  back¬ 
ground  and  foreground  partitions  can  be  calculated  straight  from  the 
image’s  histogram  (see  Eqn.  (11.7)).  The  combined  width  of  the  two 
distributions  is  measured  by  the  within- class  variance 


0wD  =  Po(<?)  •  oo(<?)  +  PiD  •  of  (q) 

(11.14) 

\ 

=  MN  •  M<?)  •  0o  (g)  +  Mq)  ■  01  (<?)] 

,  (11-15) 

where 

P0(q)  -  EpW  -  MN  •  EhW  -  MN  ’ 

i— 0  i— 0 

(11.16) 

P,(9)-Ep«-mw  X>>-  A 

(11.17) 

are  the  class  probabilities  for  C0,  Cll  respectively.  Thus  the  within- 
class  variance  in  Eqn.  (11.15)  is  simply  the  sum  of  the  individual 
variances  weighted  by  the  corresponding  class  probabilities  or  “pop¬ 
ulations”.  Analogously,  the  between- class  variance, 

Ob (5)  =  PoO?)  •  (M<?)  -  M/)2  +  P  1(9)  •  (mi(<?)  -  M/)2  (11.18) 

1  2  2 

=  mV  A(^)'b° (q)~iJi)  +  ni(<?HMg)-/^)  ]  (ii-i9) 

measures  the  distances  between  the  cluster  means  /i0,  fix  and  the 
overall  mean  fij.  The  total  image  variance  <rf  is  the  sum  of  the 
within-class  variance  and  the  between-class  variance,  that  is, 

0/  =  cr%(q)  +  al(q),  (11.20) 


for  q  =  0,  1.  Since  erf  is  constant  for  a  given  image,  the 

threshold  q  can  be  found  by  either  minimizing  the  within-variance 
cqt  or  maximizing  the  between- variance  The  natural  choice  is 
to  maximize  cr^,  because  it  only  relies  on  first-order  statistics  (i.e., 
the  within-class  means  /r0,/ii).  Since  the  overall  mean  (iT  can  be 
expressed  as  the  weighted  sum  of  the  partition  means  /i0  and  /x1 
(Eqn.  (11.6)),  we  can  simplify  Eqn.  (11.19)  to 


Po(<?)  •  Pi(<?)  •  [M<?)  -  Mi(<?)] 

1 

(MjV)2  •  ™o(<?)  ■  rii{q)  ■  [fj,0(q)  -  ^(q) 


2 


(n.21) 

(11.22) 


The  optimal  threshold  is  finally  found  by  maximizing  the  expres¬ 
sion  for  the  between-class  variance  in  Eqn.  (11.22)  with  respect  to  g, 
thereby  minimizing  the  within-class  variance  in  Eqn.  (11.15). 

Noting  that  cr^(q)  only  depends  on  the  means  (and  not  on  the 
variances)  of  the  two  partitions  for  a  given  threshold  q  allows  for  a 
very  efficient  implementation,  as  outlined  in  Alg.  11.4.  The  algorithm 
assumes  a  grayscale  image  with  a  total  of  N  pixels  and  K  intensity 
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Alg.  11.4 

Finding  the  optimal  threshold 
using  Otsu’s  method  [177].  Ini¬ 
tially  (outside  the  for-loop), 
the  threshold  q  is  assumed  to 
be  —1,  which  corresponds  to 
the  background  class  being 
empty  (n0  =  0)  and  all  pixels 
are  assigned  to  the  foreground 
class  ( n1  =  N ).  The  for-loop 
(lines  7—14)  examines  each  pos¬ 
sible  threshold  q  —  0,  .  .  .  ,  K  —  2. 
The  factor  1/ (M N)2  in  line 
11  is  constant  and  thus  not 
relevant  for  the  optimiza¬ 
tion.  The  optimal  threshold 
value  is  returned,  or  —1  if  no 
valid  threshold  was  found.  The 
function  MakeMeanTables() 
is  defined  in  Alg.  11.3. 


1: 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 


OtsuThreshold(h) 

Input:  h  :  [0,  K  —  1]  i— >  N,  a  grayscale  histogram.  Returns  the 
optimal  threshold  value  or  —1  if  no  threshold  is  found. 

K  <—  Size(h)  D>  number  of  intensity  levels 

(Mo>Mi  ,MN)  i —  MakeMeanTables(h)  >  see  Alg.  11.3 

2  . _  Q 

^bmax  '  ^ 

9max  ^  1 

TIq  i —  0 

for  q  <—  0,  •  • ' ,  K— 2  do  t>  examine  all  possible  threshold  values  q 
n0  n0  +  h(g) 
n1  A-  MN  —  n0 
if  (n0  >  0)  A  ( n1  >  0)  then 

ab  (MN)2  ■  n°  '  ”1  ■  [^o(d  -  Ml  (a)]2  >  see  Eq.  11.22 

r\  r\  o 

if  crb  >  crbmax  then  >  maximize  crb 

2  ,  2 

^bmax  ' 

9max  ^  Q 
return  qmax 


levels.  As  in  Alg.  11.3,  precalculated  tables  iiQ(q),  Mi (o)  are  used 
for  the  background  and  foreground  means  for  all  possible  threshold 
values  q  =  0, . . . ,  K  —  1. 

Possible  threshold  values  are  q  =  0, . . . ,  K  —  2  (with  q  =  K  —  1,  all 
pixels  are  assigned  to  the  background).  Initially  (before  entering  the 
main  for-loop  in  line  7)  q  =  —1;  at  this  point,  the  set  of  background 
pixels  (<  q)  is  empty  and  all  pixels  are  classified  as  foreground  (n0  = 
0  and  n1  =  TV).  Each  possible  threshold  value  is  examined  inside  the 
body  of  the  for-loop. 

As  long  as  any  of  the  two  classes  is  empty  (n0(q)  =  0  or  n1(q)  = 
0),3  the  resulting  between-class  variance  cr^(q)  is  zero.  The  threshold 
that  yields  the  maximum  between-class  variance  (crbmax)  is  returned, 
or  —1  if  no  valid  threshold  could  be  found.  This  occurs  when  all 
image  pixels  have  the  same  intensity,  that  all  pixels  are  in  either  the 
background  or  the  foreground  class. 

Note  that  in  line  11  of  Alg.  11.4,  the  factor  is  constant  (inde¬ 
pendent  of  q)  and  can  thus  be  ignored  in  the  optimization.  However, 
care  must  be  taken  at  this  point  because  the  computation  of  crb  may 
produce  intermediate  values  that  exceed  the  range  of  typical  (32-bit) 
integer  variables,  even  for  medium-size  images.  Variables  of  type 
long  should  be  used  or  the  computation  be  performed  with  floating¬ 
point  values. 

The  absolute  “goodness”  of  the  final  thresholding  by  gmax  could 
be  measured  as  the  ratio 
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=  °~b(gmax)  g  ^  1j  (11.23) 


3  This  is  the  case  if  the  image  contains  no  pixels  with  values  I(u,v )  <  q 
or  I(u,v )  >  q ,  that  is,  the  histogram  h  is  empty  either  below  or  above 
the  index  q. 


0  255  0  255  0  255  0  255 

(e)  r\  =  0.84  (f)  r]  =  0.77  (g)  r?  =  0.62  (h)  r?  =  0.53 


(see  Eqn.  (11.8)),  which  is  invariant  under  linear  changes  of  contrast 
and  brightness  [177].  Greater  values  of  g  indicate  better  threshold¬ 
ing. 

Results  of  automatic  threshold  selection  with  Otsu’s  method  are 
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Fig.  11.4 

Results  of  thresholding  with 
Otsu’s  method.  Calculated 
threshold  values  q  and  re¬ 
sulting  binary  images  (a— d). 
Graphs  in  (e— h)  show  the  cor¬ 
responding  within-background 
variance  ctq  (green),  the 
within-foreground  variance 
a\  (blue),  and  the  between- 
class  variance  cr^  (red),  for 
varying  threshold  values 
q  =  0,  .  .  .  ,  255.  The  optimal 
threshold  gmax  (dashed  verti¬ 
cal  line)  is  positioned  at  the 
maximum  of  crh.  The  value  77 
denotes  the  “goodness”  esti¬ 
mate  for  the  thresholding,  as 
defined  in  Eqn.  (11.23). 


shown  in  Fig.  11.4,  where  <7max  denotes  the  optimal  threshold  and  g 
is  the  corresponding  “goodness”  estimate,  as  defined  in  Eqn.  (11.23). 
The  graph  underneath  each  image  shows  the  original  histogram 
(gray)  overlaid  with  the  variance  within  the  background  <7q  (green), 
the  variance  within  the  foreground  g\  (blue),  and  the  between-class 
variance  (red)  for  varying  threshold  values  q.  The  dashed  vertical 
line  marks  the  position  of  the  optimal  threshold  gmax. 

Due  to  the  pre-calculation  of  the  mean  values,  Otsu’s  method  re¬ 
quires  only  three  passes  over  the  histogram  and  is  thus  very  fast 
(0(K)),  in  contrast  to  opposite  accounts  in  the  literature.  The 
method  is  frequently  quoted  and  performs  well  in  comparison  to  other 
approaches  [213],  despite  its  long  history  and  its  simplicity.  In  gen¬ 
eral,  the  results  are  very  similar  to  the  ones  produced  by  the  iterative 
threshold  selection  (“isodata”)  algorithm  described  in  Sec.  1EE3. 


11.1.5  Maximum  Entropy  Thresholding 

Entropy  is  an  important  concept  in  information  theory  and  particu¬ 
larly  in  data  compression.  It  is  a  statistical  measure  that  quantifies 
the  average  amount  of  information  contained  in  the  “messages”  gen¬ 
erated  by  a  stochastic  data  source  [99,101].  For  example,  the  MN 
pixels  in  an  image  I  can  be  interpreted  as  a  message  of  MN  sym¬ 
bols,  each  taken  independently  from  a  finite  alphabet  of  K  (e.g., 
256)  different  intensity  values.  Every  pixel  is  assumed  to  be  stati¬ 
cally  independent.  Knowing  the  probability  of  each  intensity  value 
g  to  occur,  entropy  measures  how  likely  it  is  to  observe  a  particular 
image,  or,  in  other  words,  how  much  we  should  be  surprised  to  see 
such  an  image.  Before  going  into  further  details,  we  briefly  review 
the  notion  of  probabilities  in  the  context  of  images  and  histograms 
(see  also  Ch.  4,  Sec.  4.6.1). 

For  modeling  the  image  generation  as  a  random  process,  we  first 
need  to  define  an  “alphabet”,  that  is,  a  set  of  symbols 
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Z  =  {0, 1, . . . ,  K  —  l}  ,  (11.24) 

which  in  this  case  is  simply  the  set  of  possible  intensity  values 
g  =  0, . . .  ,K  —  1,  together  with  the  probability  p(g)  that  a  partic¬ 
ular  intensity  value  g  occurs.  These  probabilities  are  supposed  to  be 
known  in  advance,  which  is  why  they  are  called  a  priori  (or  prior) 
probabilities.  The  vector  of  probabilities, 

(p(  0),p(l),...,p{K-l)), 

is  a  probability  distribution  or  probability  density  function  (pdf).  In 
practice,  the  a  priori  probabilities  are  usually  unknown,  but  they  can 
be  estimated  by  observing  how  often  the  intensity  values  actually  oc¬ 
cur  in  one  or  more  images,  assuming  that  these  are  representative 
instances  of  the  images  typically  produced  by  that  source.  An  esti¬ 
mate  p (g)  of  the  image’s  probability  density  function  p(g)  is  obtained 
by  normalizing  its  histogram  h  in  the  form 

p(g)  «  Pig)  =  (H-25) 

for  0  <  g  <  K,  such  that  0  <  p (g)  <  1  and  P($)  =  1-  The 

associated  cumulative  distribution  function  (cdf)  is 

P<9>  =  E  S#  =  t,  PW.  (11-26) 

2  =  0  2  =  0 

where  P(0)  =  p(0)  and  P(K  —  1)  =  1.  This  is  simply  the  normalized 
cumulative  histogram .4 

Entropy  of  images 

Given  an  estimate  of  its  intensity  probability  distribution  p (g),  the 
entropy  of  an  image  is  defined  as5 

Hiz)  =  N  p(s0  ■  logfe(vu) =  _ T  p(s)  ■  log6 >  (u-27) 

g£Z  P^9)  g£Z 

where  g  =  I(u,  v )  and  \ogb(x)  denotes  the  logarithm  of  x  to  the  base 
b.  If  b  =  2,  the  entropy  (or  “information  content”)  is  measured  in 
bits ,  but  proportional  results  are  obtained  with  any  other  logarithm 
(such  as  In  or  log10).  Note  that  the  value  of  HQ  is  always  positive, 
because  the  probabilities  p()  are  in  [0, 1]  and  thus  the  terms  logb  [p()] 
are  negative  or  zero  for  any  b. 

Some  other  properties  of  the  entropy  are  also  quite  intuitive.  For 
example,  if  all  probabilities  p (g)  are  zero  except  for  one  intensity 
then  the  entropy  H(I)  is  zero ,  indicating  that  there  is  no  uncertainty 
(or  “surprise”)  in  the  messages  produced  by  the  corresponding  data 
source.  The  (rather  boring)  images  generated  by  this  source  will 
contain  nothing  but  pixels  of  intensity  g' ,  since  all  other  intensities  are 

4  See  also  Chapter  3,  Sec.  3.6. 

5  Note  the  subtle  difference  in  notation  for  the  cumulative  histogram  H 
and  the  entropy  H . 
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impossible.  Conversely,  the  entropy  is  a  maximum  if  all  K  intensities 
have  the  same  probability  (uniform  distribution), 

p(g)  =  -^,  for  0  <g<K,  (11.28) 

and  therefore  (from  Eqn.  (11.27))  the  entropy  in  this  case  is 

H{Z)  =  -  E  ^  •  lo&(^)  =  i  •  E  (11.29) 

i— 0  i— 0 

v - v - ' 

K-logb(K) 

=  --(K-logb(K))=logb(K).  (11.30) 

This  is  the  maximum  possible  entropy  of  a  stochastic  source  with  an 
alphabet  Z  of  size  K.  Thus  the  entropy  H(Z)  is  always  in  the  range 

[0,log(iO]. 


Using  image  entropy  for  threshold  selection 

The  use  of  image  entropy  as  a  criterion  for  threshold  selection  has  a 
long  tradition  and  numerous  methods  have  been  proposed.  In  the  fol¬ 
lowing,  we  describe  the  early  (but  still  popular)  technique  by  Kapur 
et  al.  [100, 133]  as  a  representative  example. 

Given  a  particular  threshold  q  (with  0  <  q  <  K— 1),  the  estimated 
probability  distributions  for  the  resulting  partitions  C0  and  C1  are 


Co=  ( 

Ci  :  (  0 


p(0)  p(l)  p(g) 

Po(q)  p0(q)  •••  p0(q) 


0 


0 


0 


0 


P(g+1)  P(g+2) 

Pi  (q)  Pi(q) 


o  ), 

p(jsr-i)  x 

Pi  (9)  >' 


(11.31) 


with  the  associated  cumulated  probabilities  (see  Eqn.  (11.26)) 


q 


K- 1 


po(<?)  =  E  p(*) =  pte)  and  gd =  E  p(*) =  1  ~  pD- 

i=9+1  (11.32) 


i=0 


Note  that  P0(<7)  +  Pi(<?)  =  1,  since  the  background  and  foreground 
partitions  are  disjoint.  The  entropies  within  each  partition  are  de¬ 
fined  as 


H0(q) 

HM 


(11.33) 

(11.34) 


and  the  overall  entropy  for  the  threshold  q  is 


Hoi(q)  —  H0(q)  +  Hi(q).  (11.35) 

This  expression  is  to  be  maximized  over  <7,  also  called  the  “infor¬ 
mation  between  the  classes”  C0  and  C1.  To  allow  for  an  efficient 
computation,  the  expression  for  H0(q)  in  Eqn.  (1E33)  can  be  rear¬ 
ranged  to 
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(11.36) 


q 


H0(q)  =  ~YJ 


i= 0 

1 


pOO 

PoL) 


[l°g(p(i))  -  log(P0(g)) 


q 


PoL) 

1 


E  '  [1°g(p(i))  ~  los(Po(9)) 


(11.37) 


i= 0 
q 


P°<«>  to 

s 


^p(i) -log(p(i))  + 


1 


q 


pM 


Y  p(i)  ■  log(P0(9)) 


1 


PoL) 


S0(q) 

S0{q )  +  iog(P„(«)). 


PqL) 


(11.38) 


Similarly  H1(q)  in  Eqn.  (11.34)  becomes 


K- 1 

^iD  =  -E 

i=<7+l 

1 


pG) 

PiL) 


[log(p(0)  -  log(PiD) 


1-PoO) 


Si(q)  +log(l-P0(g)) 


(11.39) 


(11.40) 


Given  the  estimated  probability  distribution  p(i),  the  cumulative 
probability  P0  and  the  summation  terms  50,  S1  (see  Eqns.  (11.38)- 
(11.40))  can  be  calculated  from  the  recurrence  relations 


PoU)  =  | 

fp(0) 

[Po(<?-l)  +  p(<?) 

for  <7  =  0, 
for  0  <  q  <  K 

s0(q)  =  | 

\ p(0)  •  log(p(0)) 

for  q  =  0, 

[SoL-i)  +  p(<?)  •  log(p(<?)) 

for  0  <  q  <  K 

Si(q)=  | 

fo 

1  S'i(qr+1)  +  p(q+l)  •  log(p(q+l)) 

for  q  =  K  —  1 , 
for  0  <  q  <  K 

(H.4!) 


The  complete  procedure  is  summarized  in  Alg.  11.5,  where  the  val¬ 
ues  S0(q),  Si(q)  are  obtained  from  precalculated  tables  Sq^.  The 
algorithm  performs  three  passes  over  the  histogram  of  length  K  (two 
for  filling  the  tables  Sq^  and  one  in  the  main  loop),  so  its  time 
complexity  is  0(K ),  like  the  algorithms  described  before. 

Results  obtained  with  this  technique  are  shown  in  Fig.  11.5.  The 
technique  described  in  this  section  is  simple  and  efficient,  because 
it  again  relies  entirely  on  the  image’s  histogram.  More  advanced 
entropy-based  thresholding  techniques  exist  that,  among  other  im¬ 
provements,  take  into  account  the  spatial  structure  of  the  original 
image.  An  extensive  review  of  entropy-based  methods  can  be  found 
in  [46]. 


11.1.6  Minimum  Error  Thresholding 

The  goal  of  minimum  error  thresholding  is  to  optimally  fit  a  combi¬ 
nation  (mixture)  of  Gaussian  distributions  to  the  image’s  histogram. 
Before  we  proceed,  we  brieffy  look  at  some  additional  concepts  from 
statistics.  Note,  however,  that  the  following  material  is  only  intended 
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1: 

MaximumEntropyThreshold(h) 

Input:  h  :  [0,K  —  1\  4  N,  a  grayscale  histogram.  Returns  the 

optimal  threshold  value  or  —1  if  no 

threshold  is  found. 

2 

K  <—  Size(h) 

>  number  of  intensity  levels 

3 

p  <—  Normalize(h) 

>  normalize  histogram 

4 

(ScpSi)  <—  MakeTables(p,  K) 

>  tables  for  S0(q),  Si(q) 

5 

P  o  0 

>  Po  C  [0, 1] 

6 

f/max  ^  1 

7 

^max  ^ 

>  maximum  joint  entropy 

8 

for  q  <—  0,  •  • ' ,  K  —  2  do  >  check  all  possible  threshold  values  q 

9 

P o  P o  +  p (q) 

10 

P  i  1  —  P  o 

>  Pi  G  [0, 1] 

11 

„  ,  f  -  it  •  so (a)  +  !og(P0) 
H°^\o  ° 

if  P0  >  0 

>  BG  entropy 

otherwise 

12 

H  f-jy -SibO  +  logTx) 

1  1° 

if  pi  >  0 

>  fG  entropy 

otherwise 

13 

Hm  =H0  +  Hl 

>  overall  entropy  for  q 

14 

if  H01  >  iTmax  then 

>  maximize  H01(q) 

15 

^max  ^  -^01 

16 

ftnax  ^  0. 

17 

return 

18 

MakeTables(p,  K) 

19 

Create  maps  S0,  Sx  :  [0,K  —  1] 

20 

s0  0 

21 

for  i  <—  0,  •  • ' ,  K  —  1  do 

>  initialize  table  S0 

22 

if  p(i)  >  0  then 

23 

So  t-  So  +  p(i)  •  log(p(*)) 

24 

SoW  so 

25 

s1  <—  0 

26 

for  i  <—  K  —  1 ,  *  •  • ,  0  do 

>  initialize  table  Sx 

27 

Si(i)  s1 

28 

if  p(z)  >  0  then 

29 

Si  «-  Sj  +  p(i)  •  log(p(*)) 

30 

return  (S0,  Sx) 

11.1  Global 

Histogram-Based 

Thresholding 

Alg.  11.5 

Maximum  entropy  thresh¬ 
old  selection  after  Kapur  et 
al.  [133].  Initially  (outside  the 
for-loop),  the  threshold  q  is 
assumed  to  be  —1,  which  cor¬ 
responds  to  the  background 
class  being  empty  (n0  =  0) 
and  all  pixels  assigned  to  the 
foreground  class  ( n1  —  N ). 
The  for-loop  (lines  8—16)  ex¬ 
amines  each  possible  threshold 
q  =  0,  .  .  .  ,  K  —  2.  The  optimal 
threshold  value  (0,  .  .  .  ,  K  —  2) 
is  returned,  or  —1  if  no  valid 
threshold  was  found. 


Fig.  11.5 

Thresholding  with  the 
Maximum-entropy  method. 
Calculated  threshold  values  q 
and  resulting  binary  images 
(a— d).  Graphs  in  (e— h)  show 
the  background  entropy  H0(q ) 
(green),  foreground  entropy 
H1(q)  (blue)  and  overall  en¬ 
tropy  H01(q)  =  H0(q )  +  H^q) 
(red),  for  varying  threshold 
values  q.  The  optimal  thresh¬ 
old  <7max  is  found  at  the  max¬ 
imum  of  H01  (dashed  vertical 
line). 
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as  a  superficial  outline  to  explain  the  elementary  concepts.  For  a 
solid  grounding  of  these  and  related  topics  readers  are  referred  to 
the  excellent  texts  available  on  statistical  pattern  recognition,  such 
as  [24,64]. 

Bayesian  decision  making 

The  assumption  is  again  that  the  image  pixels  originate  from  one  of 
two  classes,  C0  and  C1,  or  background  and  foreground,  respectively. 
Both  classes  generate  random  intensity  values  following  unknown  sta¬ 
tistical  distributions.  Typically,  these  are  modeled  as  Gaussian  dis¬ 
tributions  with  unknown  parameters  fi  and  cr2,  as  will  be  described. 
The  task  is  to  decide  for  each  pixel  value  x  to  which  of  the  two  classes 
it  most  likely  belongs.  Bayesian  reasoning  is  a  classic  technique  for 
making  such  decisions  in  a  probabilistic  context. 

The  probability ,  that  a  certain  intensity  value  x  originates  from  a 
background  pixel  is  denoted 


p(x  I  C0). 

This  is  called  a  “conditional  probability”.6  It  tells  us  how  likely  it  is  to 
observe  the  gray  value  x  when  a  pixel  is  a  member  of  the  background 
class  C0.  Analogously,  p(x  |  Ci)  is  the  conditional  probability  of 
observing  the  value  x  when  a  pixel  is  known  to  be  of  the  foreground 
class  C]_. 

For  the  moment,  let  us  assume  that  the  conditional  probability 
functions  p(x  \  C0 )  and  p(x  |  Cx)  are  known.  Our  problem  is  reversed 
though,  namely  to  decide  which  class  a  pixel  most  likely  belongs 
to,  given  that  its  intensity  is  x.  This  means  that  we  are  actually 
interested  in  the  conditional  probabilities 


p(C 0  |  X ) 


and  p(C1  |  t), 


(11.42) 


also  called  a  posteriori  (or  posterior )  probabilities.  If  we  knew  these, 
we  could  simply  select  the  class  with  the  higher  probability  in  the 
form 


if  p(C0  |  x )  >  p(Ci 
otherwise. 


(11.43) 


Bayes’  theorem  provides  a  method  for  estimating  these  posterior 
probabilities,  that  is, 


p(x  |  Cj)  -jp(Cj) 

p(x) 


(11.44) 


where  p(Cj)  is  the  prior  probability  of  class  Cj.  While,  in  theory,  the 
prior  probabilities  are  also  unknown,  they  can  be  easily  estimated 
from  the  image  histogram  (see  also  Sec.  11.1.5).  Finally,  p{x)  in  Eqn. 
(11.44)  is  the  overall  probability  of  observing  the  intensity  value  x, 

6  In  general,  p(A  |  B )  denotes  the  (conditional)  probability  of  observing 
the  event  A  in  a  given  situation  B.  It  is  usually  read  as  “the  probability 
of  A,  given  Bv. 
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which  is  typically  estimated  from  its  relative  frequency  in  one  or  more 
images.7 

Note  that  for  a  particular  intensity  x,  the  corresponding  evidence 
p(x)  only  scales  the  posterior  probabilities  and  is  thus  not  relevant  for 
the  classification  itself.  Consequently,  we  can  reformulate  the  binary 
decision  rule  in  Eqn.  (11.43)  to 


if  p{x 


Co)  • p(C0 )  >  P 0 


otherwise. 


Ci)-p(O), 


(11.45) 


This  is  called  Bayes’  decision  rule.  It  minimizes  the  probability  of 
making  a  classification  error  if  the  involved  probabilities  are  known 
and  is  also  called  the  “minimum  error”  criterion. 


Gaussian  probability  distributions 

If  the  probability  distributions  p(x  \  Cj)  are  modeled  as  Gaussian 8 
distributions  J\f(x  \  Hj,(j2j),  where  fij ,  cA  denote  the  mean  and  vari¬ 
ance  of  class  Cj,  we  can  rewrite  the  scaled  posterior  probabilities  in 
Eqn.  (11.45)  as 


p(x  I  Cj)  ■  p(Cj) 


1 


A 


exp 


7 rcr 


3 


(x  -  Mj): 

2  <4 


p(Cj).  (11.46) 


As  long  as  the  ordering  between  the  resulting  class  scores  remains  un¬ 
changed,  these  quantities  can  be  scaled  or  transformed  arbitrarily.  In 
particular,  it  is  common  to  use  the  logarithm  of  the  above  expression 
to  avoid  repeated  multiplications  of  small  numbers.  For  example, 
applying  the  natural  logarithm9  to  both  sides  of  Eqn.  (11.46)  yields 


In  (jp(x 


1 

2 

1 

2 


Cj)  ■  p(Cj ))  =  ln(p(x  I  Cj))  +  In (p(Cj)) 

1  \  .  /  /  ( x  —  p,j)2 


\/27r<J| 


+  In  exp 


1 


2a2 


ln(27r)  —  -  •  In {(jj)  — 


(x  -  Pj) 


2<Ji 


+  ln(p(0)) 

+ ln  (p(Cj)) 


ln(27r)  +  - — _! :l'  +  ln(cr|)  -  2-ln (p(Cj)) 


v2 


(11.47) 

(11.48) 

(11.49) 

(11.50) 


Since  ln(27r)  in  Eqn.  (11.50)  is  constant,  it  can  be  ignored  for  the 
classification  decision,  as  well  as  the  factor  \  at  the  front.  Thus,  to 
find  the  class  Cj  that  maximizes  p(x  \  Cj)  •  p(Cj)  for  a  given  intensity 
value  x,  it  is  sufficient  to  maximize  the  quantity 


{x  -  Pj): 


cr: 


+  2-  [ln(<7j)  -  In (p(Cj))_ 


(11.51) 


or,  alternatively,  to  minimize 

7  p(x)  is  also  called  the  “evidence”  for  the  event  x. 

8  See  also  Sec.  D.4  in  the  Appendix. 

9  Any  logarithm  could  be  used  but  the  natural  logarithm  complements 
the  exponential  function  of  the  Gaussian. 
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(11.52) 


-  +2  -  M°j)  - ln (p(Cj)) 


The  quantity  e3  ( x )  can  be  viewed  as  a  measure  of  the  potential  error 
involved  in  classifying  the  observed  value  x  as  being  of  class  Cj.  To 
obtain  the  decision  associated  with  the  minimum  risk,  we  can  modify 
the  binary  decision  rule  in  Eqn.  (11.45)  to 


C 


C0  if  s0(x)  <  ei(x), 
Ci  otherwise. 


(11.53) 


Remember  that  this  rule  tells  us  how  to  correctly  classify  the  observed 
intensity  value  x  as  being  either  of  the  background  class  C0  or  the 
foreground  class  C1,  assuming  that  the  underlying  distributions  are 
really  Gaussian  and  their  parameters  are  well  estimated. 


Goodness  of  classification 

If  we  apply  a  threshold  q,  all  pixel  values  g  <  q  are  implicitly  classified 
as  C0  (background)  and  all  g  >  q  as  C1  (foreground).  The  goodness  of 
this  classification  by  q  over  all  N  image  pixels  I(u,  v )  can  be  measured 
with  the  criterion  function 

e(<?) 


with  the  normalized  frequencies  p (g)  =  h  (g)/N  and  the  function  £j(g) 
as  defined  in  Eqn.  (11.52).  By  substituting  £j(g)  from  Eqn.  (11.52) 
and  some  mathematical  gymnastics,  e(q)  can  be  written  as 

e(q)  =  1  +  P0(q)  ■  lnpjfo))  +  P^q)  ■  ln(al(q)) 

~  2  •  P0(q)  ■  In (Po(q))  -  2  •  Pi(q)  ■  ln(Pi(g)).  (11.57) 

The  remaining  task  is  to  find  the  threshold  q  that  minimizes  e(q) 
(where  the  constant  1  in  Eqn.  (11.57)  can  be  omitted,  of  course). 
For  each  possible  threshold  <7,  we  only  need  to  estimate  (from  the 
image’s  histogram,  as  in  Eqn.  (11.31))  the  “prior”  probabilities  P0(g), 
P\(q)  and  the  corresponding  within-class  variances  cr0(g),  cr1(q).  The 
prior  probabilities  for  the  background  and  foreground  classes  are 
estimated  as 


1 


E 


J  £0(I(u,  v))  for  7(r,  u)  <  g 


MN  tT  \ei(J(u’0)  for  I(u,  v)  >  q 


(11.54) 


1 


q 


1 


K- 1 


MN 


q 


Eh^)-£o0)  +  ^7’E  b(g)-£i(g)  (11.55) 


9=0 


g=q+ 1 


K- 1 


Ep(s') '£o0)  +  Ep(5)-£i(^ 

g= 0  g=q+ 1 


(11.56) 


q 

po(q)  ~ 


g= 0 


1 

~MN 


•E%) 


g= 0 


rcp(g) 

MN  ’ 


K- 1 


Pi(q)  ~  E 

g=q+ 1 


1 

~MN 


K- 1 


•E 

g=q+ 1 


MN  ’ 


(11.58) 


(11.59) 
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where  n0(q )  =  Yn= o  h(G  niD  =  T,f=q+i  h(G  and  MN  =  n0(q)  + 
n1(q)  is  the  total  number  of  image  pixels.  Estimates  for  background 
and  foreground  variances  ()<r§(g)  and  cr\(q ),  respectively)  defined  in 
Eqn.  (1E7),  can  be  calculated  efficiently  by  expressing  them  in  the 
form 


0o  (9) 


1 


°i(q) 


no(q) 


1 

no(q) 

l 

ni(q) 


£ h(9) ■ 95  -  '  V h(9) + 


9=0 

's 


9=0 

'S 


B0(q)  ~ 

K- 1 


1 


A)  (A 


no(q) 

2 


E  ho)  -5 


Al(q) 

1 


(11.60) 


AT-1 


(E  h(s)  -5) 


,g=fi+i 


1 

«i(g) 

with  the  quantities 


BM 


1 


Hl(<z)  9=9+1 

' - v - 

A(<?) 


”•1(9) 


A\(q) 


(11.61) 


q 

Ao(q)  =  Eh(5) ' g’ 

9=0 
K- 1 

ad  =  E  hD  'g- 

g=q+ 1 


q 

B0(q )  =  EhD 

o=0 

k-i  (1L62) 

ad  =  Eh(5)'52- 

2=9+1 


Furthermore,  the  values  <Jq(^),  crj^g)  can  be  tabulated  for  every  pos¬ 
sible  q  in  only  two  passes  over  the  histogram,  using  the  recurrence 
relations 


+>D 

B0(q) 
A1  (9) 
AD 


0 

^0(9-1)  +  h (9)  •  9 


for  q  =  0, 

for  1  <  q  <  K  —  1, 


0 

S0(9-l)  +  h(g)  •  q2 


for  q  =  0, 

for  1  <  q  <  K  —  1, 


(1E63) 

(1E64) 


0 

7Li(<7  +  1)  +  h (<2  + 1)  •  (g-f-1) 


for  q  =  A"  —  1 , 
for  0  <  q  <  77  —  2, 


(1E65) 


0 

£i(9+1)  +  h(gr+ 1)  •  (q+ 1)2 


for  q  =  7v  —  1 . 
for  0  <  q  <  K  —  2. 


(1E66) 


The  complete  minimum-error  threshold  calculation  is  summarized 
in  Alg.  1E6.  First,  the  tables  Sq^  are  set  up  and  initialized  with 
the  values  of  (TQ(q),  a\  (q),  respectively,  for  0  <  q  <  K,  following 
the  recursive  scheme  in  Eqns.  (11.63-11.66).  Subsequently,  the  error 
value  e(q)  is  calculated  for  every  possible  threshold  value  q  to  find 
the  global  minimum.  Again  e(q)  can  only  be  calculated  for  those 
values  of  g,  for  which  both  resulting  partitions  are  non-empty  (i.e., 
with  n0(q),n1(q)  >  0).  Note  that,  in  lines  27  and  37  of  Alg.  11.6, 
a  small  constant  (^)  is  added  to  the  variance  to  avoid  zero  values 
when  the  corresponding  class  population  is  homogeneous  (i.e.,  only 
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11  Automatic  contains  a  single  intensity  value).10  This  ensures  that  the  algorithm 
Thresholding  works  properly  on  images  with  only  two  distinct  gray  values.  The 

algorithm  computes  the  optimal  threshold  by  performing  three  passes 
over  the  histogram  (two  for  initializing  the  tables  and  one  for  finding 
the  minimum);  it  thus  has  the  same  time  complexity  of  O(K)  as  the 
algorithms  described  before. 

Figure  11.6  shows  the  results  of  minimum-error  thresholding  on 
our  set  of  test  images.  It  also  shows  the  fitted  pair  of  Gaussian  distri¬ 
butions  for  the  background  and  the  foreground  pixels,  respectively, 
for  the  optimal  threshold  as  well  as  the  graphs  of  the  error  function 
e(q),  which  is  minimized  over  all  threshold  values  q.  Obviously  the 
error  function  is  quite  flat  in  certain  cases,  indicating  that  similar 
scores  are  obtained  for  a  wide  range  of  threshold  values  and  the  opti¬ 
mal  threshold  is  not  very  distinct.  We  can  also  see  that  the  estimate 
is  quite  accurate  in  case  of  the  synthetic  test  image  in  Fig.  11.6(d), 
which  is  actually  generated  as  a  mixture  of  two  Gaussians  (with  pa¬ 
rameters  fj,0  =  80,  Hi  =  170  and  a0  =  a1  =  20).  Note  that  the 
histograms  in  Fig.  11.6  have  been  properly  normalized  (to  constant 
area)  to  illustrate  the  curves  of  the  Gaussians,  that  is,  properly  scaled 
by  their  prior  probabilities  ( P0 ,  P1),  while  the  original  histograms  are 
scaled  with  respect  to  their  maximum  values. 


Fig.  11.6 

Results  from  minimum-error 
thresholding.  Calculated 
threshold  values  q  and  re¬ 
sulting  binary  images  (a— d). 
The  green  and  blue  graphs  in 
(e— h)  show  the  fitted  Gaussian 
background  and  foreground 
distributions  J\f0  =  (/i0,cr 0) 
and  A/1  =  (/ii  ,  oq),  respec¬ 
tively.  The  red  graph  cor¬ 
responds  to  the  error  quan¬ 
tity  e(q)  for  varying  thresh¬ 
old  values  q  =  0,  .  .  .  ,  255 
(see  Eqn.  (11.57)).  The  op¬ 
timal  threshold  qmin  is  lo¬ 
cated  at  the  minimum  of  e(q) 
(dashed  vertical  line).  The 
estimated  parameters  of  the 
background/foreground  Gaus¬ 
sians  are  listed  at  the  bottom. 


rin  hum  Vj,'cyiJJ|Jw  n'rt- 


iiwroi  fcj 

b  tniits  dur-T-S-  c?. 

tut  wiiT™ 

trnHf r« t app. 

h.-Ll  h  -  ■■  ■  g  5  .Jrr  mv  f»— ;  acnaiil  f  '  1/ 


(a)  q  =  161  (b)  q  —  50  (c)  q  —  43  (d)  q  =  140 


Mo 

=  97.18 

Mo  = 

33.16 

Mo  = 

12.96 

Mo  = 

80.12 

^0 

=  39.48 

cr0  = 

7.28 

CT0  = 

8.74 

cr0  = 

19.98 

Mi 

=  181.74 

Mi  = 

164.94 

Mi  = 

168.44 

Mi  = 

171.93 

co 

=  7.09 

co  = 

51.04 

co  = 

32.22 

CO  = 

17.80 

A  minor  theoretical  problem  with  the  minimum  error  technique 
is  that  the  parameters  of  the  Gaussian  distributions  are  always  esti¬ 
mated  from  truncated  samples.  This  means  that,  for  any  threshold 
<7,  only  the  intensity  values  smaller  than  q  are  used  to  estimate  the 
parameters  of  the  background  distribution,  and  only  the  intensities 
greater  than  q  contribute  to  the  foreground  parameters.  In  prac¬ 
tice,  this  problem  is  of  minor  importance,  since  the  distributions  are 
typically  not  strictly  Gaussian  either. 


10  This  is  explained  by  the  fact  that  each  histogram  bin  h(i)  represents 
intensities  in  the  continuous  range  [i±0.5]  and  the  variance  of  uniformly 
distributed  values  in  the  unit  interval  is  T  • 
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1: 


2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 


MinimumErrorThreshold(h) 

Input:  h  :  [0,K  —  1\  i— >  N,  a  grayscale  histogram.  Returns  the 
optimal  threshold  value  or  —1  if  no  threshold  is  found. 

K  4—  Size(h) 

(So.Sx.JV)  4—  MakeSigmaTables(h,  K) 

TIq  4 —  0 

Qmin  * - 1 

^min  ^ 


for  q  4—  0,  •  • ' ,  K  —  2  do  >  evaluate  all  possible  thresholds  q 
n0  V-  n0  +  h  (q)  >  background  population 

n1  V-  N  —  n0  >  foreground  population 

if  (n0  >  0)  A  (n1  >  0)  then 

Po  <—  n0/N  [>  prior  probability  of  C0 

Pi  <—  rii/N  >  prior  probability  of  C\ 

e  P0  •  ln(S0(g))  +  Pi  •  ln(S1(g)) 

—  2  •  (P0  •  ln(P0)  +  P1  •  InjPi))  D>  Eq.  11.57 

if  e  <  emin  then  t>  minimize  error  for  q 

e  •  4 —  e 

v-mm  ' 


Q 


mm 


<-  Q 


return  qmin 


18: 

19: 

20: 

21: 

22: 

23: 

24: 

25: 

26: 

27: 

28: 

29: 

30: 

31: 

32: 

33: 

34: 

35: 

36: 

37: 

38: 


MakeSigmaTables(h,  K ) 

Create  maps  S0,  Sx  :  [0,  PC  —  1]  i— >■  R 
TIq  4 —  0 

4)<-0 

Po^0 

for  q  V-  0,  •  • " ,  K—  1  do 
n0  n0  +  h(g) 

A0  ^  A0  h(q)  •  q 
Bo  V-  Po  +  h(g)  •  q2 

j2  P  (Po  ~  Ao/n0)/n0 
v  0 

N  ^  n0 
n1  V-  0 
A±  4-  0 
B14r~  0 
S^K-1)  4-  0 

for  q  V-  K  —  2, '  •  • ,  0  do 

n\  4—  rii  +  h(g+l) 

Ai  4 —  Ai  +  h(g+l)  •  te+1) 

Pi  A-  Pi  +  h(g+l)  •  ((/+1)2 


So (q) 


Site) 


P2  P  (P i  —  ^i/ni)/ni 
0 


return  (S0,  Si,  N) 


O  /  \ 

D>  tabulate  a0  (q ) 


for  n0  >  0 
otherwise 


>  Eq.  11.63 

>  Eq.  11.64 

>  Eq.  11.60 


>  tabulate  g\  (q) 


for  n1  >  0 
otherwise 


D>  Eq.  11.65 

>  Eq.  11.66 

>  Eq.  11.61 


11.2  Local  Adaptive 
Thresholding 

Alg.  11.6 

Minimum  error  threshold 
selection  based  on  a  Gaus¬ 
sian  mixture  model  (af¬ 
ter  [116]).  Tables  S0,  Si  are 
intialized  with  values  <75  (q) 
and  a  1(g),  respectively  (see 
Eqns.  (11.60)— (11.61)),  for 
all  possible  threshold  values 
q  =  0,  .  .  .  ,  K  —  l.  N  is  the  num¬ 
ber  of  image  pixels.  Initially 
(outside  the  for-loop),  the 
threshold  q  is  assumed  to  be 
—  1,  which  corresponds  to  the 
background  class  being  empty 
(n0  =  0)  and  all  pixels  as¬ 
signed  to  the  foreground  class 
(n1  =  N ).  The  for-loop  (lines 
8—16)  examines  each  possible 
threshold  q  =  0,  .  .  .  ,  K  —  2.  The 
optimal  threshold  is  returned, 
or  —1  if  no  valid  threshold  was 
found. 


11.2  Local  Adaptive  Thresholding 

In  many  situations,  a  fixed  threshold  is  not  appropriate  to  classify 
the  pixels  in  the  entire  image,  for  example,  when  confronted  with 
stained  backgrounds  or  uneven  lighting  or  exposure.  Figure  11.7 
shows  a  typical,  unevenly  exposed  document  image  and  the  results 
obtained  with  some  global  thresholding  methods  described  in  the 
previous  sections. 


11  Automatic 
Thresholding 


Fig.  11.7 

Global  thresholding  methods 
fail  under  uneven  lighting  or 
exposure.  Original  image  (a), 
results  from  global  thresh¬ 
olding  with  various  meth¬ 
ods  described  above  (b— d). 


(a)  Original  (b)  Otsu  (c)  Max.  entropy  (d)  Min.  error 


Instead  of  using  a  single  threshold  value  for  the  whole  image, 
adaptive  thresholding  specifies  a  varying  threshold  value  Q(u,v)  for 
each  image  position  that  is  used  to  classify  the  corresponding  pixel 
I(u,  v)  in  the  same  way  as  described  in  Eqn.  (11.1)  for  a  global  thresh¬ 
old.  The  following  approaches  differ  only  with  regard  to  how  the 
threshold  “surface”  Q  is  derived  from  the  input  image. 

11.2.1  Bernsen’s  Method 

The  method  proposed  by  Bernsen  [23]  specifies  a  dynamic  threshold 
for  each  image  position  (u,  v),  based  on  the  minimum  and  maximum 
intensity  found  in  a  local  neighborhood  R(u,v).  If 

Gin (%v)  =  min  I(i,j), 

(i,j) e 

R(u,v ) 

4m x(u,v)  =  max  I(i,j) 

R(u,v ) 

are  the  minimum  and  maximum  intensity  values  within  a  fixed-size 
neighborhood  region  R  centered  at  position  (iqc),  the  space- varying 
threshold  is  simply  calculated  as  the  mid-range  value 

Q(u,v)  =  (11.69) 

This  is  done  as  long  as  the  local  contrast  c(u,v)  =  fmax(Md)  — 
is  above  some  predefined  limit  cmin.  If  c(u,v)  <  cmin, 
the  pixels  in  the  corresponding  image  region  are  assumed  to  belong 
to  a  single  class  and  are  (by  default)  assigned  to  the  background. 

The  whole  process  is  summarized  in  Alg.  11.7.  Note  that  the 
meaning  of  “background”  in  terms  of  intensity  levels  depends  on  the 
application.  For  example,  in  astronomy,  the  image  background  is  usu¬ 
ally  darker  than  the  objects  of  interest.  In  typical  OCR  applications, 
however,  the  background  (paper)  is  brighter  than  the  foreground  ob¬ 
jects  (print).  The  main  function  provides  a  control  parameter  bg  to 
select  the  proper  default  threshold  <7,  which  is  set  to  K  in  case  of 
a  dark  background  ( bg  =  dark)  and  to  0  for  a  bright  background 
(bg  =  bright).  The  support  region  R  may  be  square  or  circular,  typi¬ 
cally  with  a  radius  r  =  15.  The  choice  of  the  minimum  contrast  limit 
cmin  depends  on  the  type  of  imagery  and  the  noise  level  (cmin  =  15 
is  a  suitable  value  to  start  with). 

Figure  11.8  shows  the  results  of  Bernsen’s  method  on  the  uneven 
test  image  used  in  Fig.  11.7  for  different  settings  of  the  region’s  ra¬ 
dius  r.  Due  to  the  nonlinear  min-  and  max-operation,  the  resulting 
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1: 


2: 

3: 

4: 

5: 

6: 

7: 

8: 

9: 

10: 

11: 


BernsenThreshold(7,  r,  cmin,  bg ) 

Input:  /,  intensity  image  of  size  M  x  TV;  r,  radius  of  support 
region;  cmin,  minimum  contrast;  bg ,  background  type  (dark  or 
bright).  Returns  a  map  with  an  individual  threshold  value  for 
each  image  position. 


q  <r~ 


(M,  TV)  <-  Size(J) 

Create  map  Q  :  M  x  77  i— »•  M 

TV  if  bg  =  dark 
0  if  bg  =  bright 

for  all  image  coordinates  (u,  v)  £  M  x  N  do 

R  <—  MakeCircularRegion(^i,  v,  r) 

Liin  min  I(i,j) 

(x,j)£R 

imax  «-  max  I(i,j) 

(x,j)<ER 


C  4 —  I  -I 

°  '  -Unax  imin 


Q(u,  v )  e- 

return  Q 


max 


)  /2  if  C  >  Cj 


" mm 

otherwise 


12:  MakeCircularRegion(^,  n,  r) 

Returns  the  set  of  pixel  coordinates  within  the  circle  of  radius  r, 
centered  at  (u,  v) 

13:  return  £  Z2  |  —  T)2  +  (v  —  j)2  <  r2  } 


11.2  Local  Adaptive 
Thresholding 

Alg.  11.7 

Adaptive  thresholding  using 
local  contrast  (after  Bernsen 
[23]).  The  argument  to  bg 
should  be  set  to  dark  if  the 
image  background  is  darker 
than  the  structures  of  interest, 
and  to  bright  if  the  background 
is  brighter  than  the  objects. 


threshold  surface  is  not  smooth.  The  minimum  contrast  is  set  to 
cmin  =  15,  which  is  too  low  to  avoid  thresholding  low-contrast  noise 
visible  along  the  left  image  margin.  By  increasing  the  minimum 
contrast  cmin,  more  neighborhoods  are  considered  “fiat”  and  thus  ig¬ 
nored,  that  is,  classified  as  background.  This  is  demonstrated  in  Fig. 
11.9.  While  larger  values  of  cmin  effectively  eliminate  low-contrast 
noise,  relevant  structures  are  also  lost,  which  illustrates  the  difficulty 
of  finding  a  suitable  global  value  for  cmin.  Additional  examples,  using 
the  test  images  previously  used  for  global  thresholding,  are  shown  in 
Fig.  11.10. 

What  Alg.  11.7  describes  formally  can  be  implemented  quite  effi¬ 
ciently,  noting  that  the  calculation  of  local  minima  and  maxima  over 
a  sliding  window  (lines  6-8)  corresponds  to  a  simple  nonlinear  filter 
operation  (see  Ch.  5,  Sec.  5.4).  To  perform  these  calculations,  we 
can  use  a  minimum  and  maximum  filter  with  radius  r,  as  provided 
by  virtually  every  image  processing  environment.  For  example,  the 
Java  implementation  of  the  Bernsen  thresholder  in  Prog.  11.1  uses 
Image J’s  built-in  RankFi Iters  class  for  this  purpose.  The  complete 
implementation  can  be  found  on  the  book’s  website  (see  Sec.  11.3  for 
additional  details  on  the  corresponding  API). 

11.2.2  Niblack’s  Method 

In  this  approach,  originally  presented  in  [172,  Sec.  5.1],  the  threshold 
Q(n,  v)  is  varied  across  the  image  as  a  function  of  the  local  intensity 
average  fiR(u,v)  and  standard  deviation11  crR(u,v)  in  the  form 


11  The  standard  deviation  a  is  the  square  root  of  the  variance  a2 . 
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11  Automatic 
Thresholding 

Prog.  11.1 

Bernsen’s  thresholder  (Im- 
ageJ  plugin  implementation 
of  Alg.  11.7).  Note  the  use 
of  ImageJ’s  RankFilters  class 
(lines  30—32)  for  calculating 
the  local  minimum  (Imin)  and 
maximum  (Imax)  maps  inside 
the  getThresholdO  method. 

The  resulting  threshold  surface 
Q(u,v )  is  returned  as  an  8-bit 
image  of  type  ByteProcessor. 


1  package  imagingbook . pub . threshold . adapt i ve ; 

2  import  ij . plugin . filter . RankFilters ; 

3  import  ij . process . ByteProcessor ; 

4  import  imagingbook . pub . threshold . BackgroundMode ; 

5 

6  public  class  BernsenThresholder  extends  Adapt iveThresholder 

{ 

7 

8  public  static  class  Parameters  { 

9  public  int  radius  =  15; 

10  public  int  cmin  =  15; 

11  public  BackgroundMode  bgMode  =  BackgroundMode . DARK ; 

12  } 

13 

14  private  final  Parameters  params; 

15 

16  public  BernsenThresholder ()  { 

17  this. params  =  new  Parameters () ; 

18  } 

19 

20  public  BernsenThresholder (Parameters  params)  { 

21  this. params  =  params; 

22  } 

23 

24  public  ByteProcessor  getThreshold (ByteProcessor  I)  { 

25  int  M  =  I .  getWidthO  ; 

26  int  N  =  I . get Height () ; 

27  ByteProcessor  Imin  =  (ByteProcessor)  I . duplicate () ; 

28  ByteProcessor  Imax  =  (ByteProcessor)  I . duplicate () ; 

29 

30  RankFilters  rf  =  new  RankFilters () ; 

31  rf  .rank (Imin, params  .radius  , RankFilters  .MIN)  ;  // Im in(u,v) 

32  rf  .rank (Imax, params  .radius  , RankFilters  .MAX)  ;  ///max(wu) 

33 

34  int  q  =  (params . bgMode  ==  BackgroundMode . DARK)  ? 

35  256  :  0; 

36  ByteProcessor  Q  =  new  ByteProcessor  (M,  N)  ;  // Q(u,v) 

37 

38  for  (int  v  =  0;  v  <  N;  v++)  { 

39  for  (int  u  =  0;  u  <  M;  u++)  { 

40  int  gMin  =  Imin. get (u,  v) ; 

41  int  gMax  =  Imax. get (u,  v) ; 

42  int  c  =  gMax  -  gMin; 

43  if  (c  >=  params. cmin) 

44  Q.set(u,  v,  (gMin  +  gMax)  /  2); 

45  else 

46  Q  .  set  (u,  v,  q)  ; 

47  } 

48  } 

49  return  Q ; 

50  } 

51  } 
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(b)  cmin  30 


(c)  cm^n  60 


11.2  Local  Adaptive 
Thresholding 


Fig.  11.8 

Adaptive  thresholding  using 
Bernsen’s  method.  Original 
image  (a),  local  minimum  (b), 
and  maximum  (c).  The  cen¬ 
ter  row  shows  the  binarized 
images  for  different  settings 
of  r  (d— f).  The  correspond¬ 
ing  curves  in  (g— i)  show  the 
original  intensity  (gray),  local 
minimum  (green),  maximum 
(red),  and  the  actual  thresh¬ 
old  (blue)  along  the  horizontal 
line  marked  in  (a— c).  The  re¬ 
gion  radius  r  is  15  pixels,  the 
minimum  contrast  cmin  is  15 
intensity  units. 


Fig.  11.9 

Adaptive  thresholding  using 
Bernsen’s  method  with  differ¬ 
ent  settings  of  cmin.  Binarized 
images  (top  row)  and  threshold 
surface  Q(u,v )  (bottom  row). 
Black  areas  in  the  threshold 
functions  indicate  that  the  lo¬ 
cal  contrast  is  below  cmin;  the 
corresponding  pixels  are  clas¬ 
sified  as  background  (white  in 
this  case). 


Q(u,  v )  :=  /1r(u,  v)  +  k  •  c Jr(u ,  v).  (11.70) 

Thus  the  local  threshold  Q(u,v)  is  determined  by  adding  a  constant 
portion  (ft  >  0)  of  the  local  standard  deviation  aR  to  the  local  mean 
fiR.  Hr  and  cfr  are  calculated  over  a  square  support  region  R  centered 
at  (u,v).  The  size  (radius)  of  the  averaging  region  R  should  be  as 
large  as  possible,  at  least  larger  than  the  size  of  the  structures  to  be 
detected,  but  small  enough  to  capture  the  variations  (unevenness) 


11  Automatic 
Thresholding 


Fig.  11.10 

Additional  examples  for 
Bernsen’s  method.  Original 
images  (a— d),  local  minimum 
Ann  (e-h),  maximum  7max 
(i— 1),  and  threshold  map  Q 
(m— p);  results  after  thresh¬ 
olding  the  images  (q— t).  Set¬ 
tings  are  r  =  15,  cmin  =  15. 
A  bright  background  is  as¬ 
sumed  for  all  images  {bg  — 
bright),  except  for  image  (d). 


Original  image 
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of  the  background.  A  size  of  31  x  31  pixels  (or  radius  r  =  15)  is 
suggested  in  [172]  and  k  =  0.18,  though  the  latter  does  not  seem  to 
be  critical. 

One  problem  is  that,  for  small  values  of  crR  (as  obtained  in  “flat” 
image  regions  of  approximately  constant  intensity),  the  threshold  will 
be  close  to  the  local  average,  which  makes  the  segmentation  quite 
sensitive  to  low-amplitude  noise  (“ghosting”).  A  simple  improvement 
is  to  secure  a  minimum  distance  from  the  mean  by  adding  a  constant 
offset  d,  that  is,  replacing  Eqn.  (11.70)  by 
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Q(u ,  v)  :=  fiR(u ,  v)  +  k  •  aR(u,  v )  -j-  d, 


(11.71) 


with  d  >  0,  in  the  range  2, . . . ,  20  for  typical  8-bit  images.  11.2  Local  Adaptive 

The  original  formulation  (Eqn.  (11.70))  is  aimed  at  situations  Thresholding 
where  the  foreground  structures  are  brighter  than  the  background 
(Fig.  11.11(a))  but  does  not  work  if  the  images  are  set  up  the  other 
way  round  (Fig.  11.11(b)).  In  the  case  that  the  structures  of  interest 
are  darker  than  the  background  (as,  e.g.,  in  typical  OCR  applica¬ 
tions),  one  could  either  work  with  inverted  images  or  modify  the 
calculation  of  the  threshold  to 


Q{u,V ) 


v)  +  (k-  <jr(u ,  v)  +  d) 
dn{u,v)  -  {k  ■  aR{u,v)  +  d) 


for  dark  BG, 
for  bright  BG. 


(11.72) 


Dark  background  Bright  background 


(a)  (b) 


The  modified  procedure  is  detailed  in  Alg.  1F8.  The  example 
in  Fig.  IF  12  shows  results  obtained  with  this  method  on  an  image 
with  a  bright  background  containing  dark  structures,  for  n  =  0.3 
and  varying  settings  of  d.  Note  that  setting  d  =  0  (Fig.  lF12(d, 
g))  corresponds  to  Niblack’s  original  method.  For  these  examples, 
a  circular  window  of  radius  r  =  15  was  used  to  compute  the  local 
mean  /iR(u,  v )  and  variance  crR(u,  v).  Additional  examples  are  shown 
in  Fig.  IF  13.  Note  that  the  selected  radius  r  is  obviously  too  small 
for  the  structures  in  the  images  in  Fig.  lF13(c,  d),  which  are  thus 
not  segmented  cleanly.  Better  results  can  be  expected  with  a  larger 
radius. 


Fig.  11.11 

Adaptive  thresholding  based 
on  average  local  intensity.  The 
illustration  shows  a  line  profile 
as  typically  found  in  document 
imaging.  The  space-variant 
threshold  Q  (dotted  blue  line) 
is  chosen  as  the  local  average 
fiR  (dashed  green  line)  offset 
by  a  multiple  of  the  local  in¬ 
tensity  variation  aR.  The  offset 
is  chosen  to  be  positive  for  im¬ 
ages  with  a  dark  background 
and  bright  structures  (a)  and 
negative  if  the  background  is 
brighter  than  the  contained 
structures  of  interest  (b). 


With  the  intent  to  improve  upon  Niblack’s  method,  particularly 
for  thresholding  deteriorated  text  images,  Sauvola  and  Pietikainen 
[207]  proposed  setting  the  threshold  to 


Q{u,V ) 


Hr(u,v)-  [l  - 

max 

Vr(u,v)-[  1  + 

1—  '  O  rn  o  v 


max 


i)] 

i)] 


for  dark  BG, 

(11.73) 

for  bright  BG, 


with  n  =  0.5  and  crmax  =  128  (the  “dynamic  range  of  the  standard 
deviation”  for  8-bit  images)  as  suggested  parameter  values.  In  this 
approach,  the  offset  between  the  threshold  and  the  local  average  not 
only  depends  on  the  local  variation  crR  (as  in  Eqn.  (11.70)),  but  also 
on  the  magnitude  of  the  local  mean  fiR\  Thus,  changes  in  absolute 
brightness  lead  to  modified  relative  threshold  values,  even  when  the 
image  contrast  remains  constant.  Though  this  technique  is  frequently 
referenced  in  the  literature,  it  appears  questionable  if  this  behavior 
is  generally  desirable. 


Calculating  local  mean  and  variance 

Algorithm  11.8  shows  the  principle  operation  of  Niblack’s  method 
and  also  illustrates  how  to  efficiently  calculate  the  local  average  and 
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Fig.  11.12 

Adaptive  thresholding  using 
Niblack’s  method  (with  r  =  15, 
k,  =  0.3).  Original  image  (a), 
local  mean  (b),  and  stan¬ 
dard  deviation  crR  (c).  The 
result  for  d  =  0  in  (d)  corre¬ 
sponds  to  Niblack’s  original 
formulation.  Increasing  the 
value  of  d  reduces  the  amount 
of  clutter  in  regions  with  low 
variance  (e,  f).  The  curves  in 
(g— i)  show  the  local  intensity 
(gray),  mean  (green),  vari¬ 
ance  (red),  and  the  actual 
threshold  (blue)  along  the  hor¬ 
izontal  line  marked  in  (a— c). 


(d)  d  —  0  (e)  d  —  5  (f)  d  =  10 
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variance.  Given  the  image  I  and  the  averaging  region  i?,  we  can  use 
the  shortcut  suggested  in  Eqn.  (3.12)  to  obtain  these  quantities  as 


1  . 
Mr  =  -  '  A 
n 


and  <j\  —  —  •  (B - -  A2), 


1 


1 


n 


n 


with 


A  =  YJn.hj\  B=YJi2dd\ 


n 


(11.74) 


R\.  (11.75) 


(iJ)eR 


Procedure  GetLocalMeanAndVarianceQ  in  Alg.  11.8  shows  this  calcu¬ 
lation  in  full  detail. 

When  computing  the  local  average  and  variance,  attention  must 
be  paid  to  the  situation  at  the  image  borders,  as  illustrated  in  Fig. 
11.14.  Two  approaches  are  frequently  used.  In  the  first  approach 
(following  the  common  practice  for  implementing  filter  operations), 
all  outside  pixel  values  are  replaced  by  the  closest  inside  pixel,  which 
is  always  a  border  pixel.  Thus  the  border  pixel  values  are  effectively 
replicated  outside  the  image  boundaries  and  thus  these  pixels  have 
a  strong  influence  on  the  local  results.  The  second  approach  is  to 
perform  the  calculation  of  the  average  and  variance  on  only  those 
image  pixels  that  are  actually  covered  by  the  support  region.  In  this 
case,  the  number  of  pixels  (TV)  is  reduced  at  the  image  borders  to 
about  1/4  of  the  full  region  size. 

Although  the  calculation  of  the  local  mean  and  variance  outlined 
by  function  GetLocalMeanAndVarianceQ  in  Alg.  11.8  is  definitely  more 


1: 


2: 

3: 

4: 

5: 

6: 

7: 

8: 

9: 


10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 


NiblackThreshold(7,  r,  ft,  d ,  bg ) 

Input:  7,  intensity  image  of  size  M  x  TV;  r,  radius  of  sup¬ 
port  region;  ft,  variance  control  parameter;  d,  minimum  offset; 
bg  e  {dark,  bright},  background  type.  Returns  a  map  with  an 
individual  threshold  value  for  each  image  position. 

(M,  N)  <-  Size(J) 

Create  map  Q  :  M  x  fV  i— »•  M 


for  all  image  coordinates  (u,v)  £  M  x  N  do 

Define  a  support  region  of  radius  r,  centered  at  (u,v): 

(/r,  a2)  <—  Getl_ocalMeanAndVariance(7,  u,  v,  r) 

-2  >  local  std.  deviation  <jr 

r  +  (ft  •  a  +  d)  if  bg  =  dark 
r  —  (ft  •  a  +  d)  if  bg  =  bright 


a  <- - 


a * 


Q(u ,  v)  e- 


>  Eq.  11.72 


return  Q 


GetLocalMeanAndVariance(7,  u,  v,  r ) 

Returns  the  local  mean  and  variance  of  the  image  pixels  7 (y,  j) 
within  the  disk-shaped  region  with  radius  r  around  position 
(u,  v). 

R  MakeCircularRegion(iq  v,  r)  >  see  Alg.  11.7 

n  x-  0 
A  4r-  0 
B  <-  0 

for  all  (i,j)  £  R  do 
n  n  +  1 
A^A  +  I{i,j) 

B^B  +  I2(i,j) 

it  ^ —  —  •  /I 

'  7T, 

cr2  ±  •  (B  -  i  •  ^42) 
return  (/x,  cr2) 


11.2  Local  Adaptive 
Thresholding 

Alg.  11.8 

Adaptive  thresholding  us¬ 
ing  local  mean  and  variance 
(modified  version  of  Niblack’s 
method  [172]).  The  argument 
to  bg  should  be  dark  if  the  im¬ 
age  background  is  darker  than 
the  structures  of  interest,  bright 
if  the  background  is  brighter 
than  the  objects.  The  function 
MakeCircularRegion()  is  defined 
in  Alg.  11.7. 


efficient  than  a  brute-force  approach,  additional  optimizations  are 
possible.  Most  image  processing  environments  have  suitable  routines 
already  built  in.  With  Image  J,  for  example,  we  can  again  use  the 
RankFilters  class  (as  with  the  min-  and  max-filters  in  the  Bernsen 
approach,  see  Sec.  11.2.1).  Instead  of  performing  the  computation  for 
each  pixel  individually,  the  following  ImageJ  code  segment  uses  pre¬ 
defined  filters  to  compute  two  separate  images  Imean  (rr)  and  Ivar 
(<Jr)  containing  the  local  mean  and  variance  values,  respectively,  with 
a  disk-shaped  support  region  of  radius  15: 

ByteProcessor  I;  //  original  image  7(/a,n) 
int  radius  =  15; 


FloatProcessor  Imean  =  I . convertToFloatProcessor () ; 
FloatProcessor  Ivar  =  Imean . duplicate () ; 


RankFilters  rf  =  new  RankFilters () ; 

rf . rank (Imean ,  radius,  RankFilters . MEAN) ; 

rf . rank (Ivar ,  radius,  RankFilters . VARIANCE) 


H  Hr(u,v) 
H  <*\ (u,v) 
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Fig.  11.13 

Additional  examples  for 
thresholding  with  Niblack’s 
method  using  a  disk-shaped 
support  region  of  radius 
r  =  15.  Original  images  (a— d), 
local  mean  /iR  (e— h),  std.  de¬ 
viation  crR  (i— 1),  and  threshold 
Q  (m— p);  results  after  thresh¬ 
olding  the  images  (q— t).  The 
background  is  assumed  to  be 
brighter  than  the  structures  of 
interest,  except  for  image  (d), 
which  has  a  dark  background. 
Settings  are  k,  =  0.3,  d  =  5. 
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Original  image 


r H r -I- r-r  ‘  J  t/iiT-PT  f  r-T;in i'T i i-  H Tij'  Jjir#rni-f  IbT1 
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}■&{  m  -  W-mTiw  _r 'i"  f** 

lilt.  r j Tl w!«  i* ruj. rf -1 

: 


(c) 


(d) 


Local  average  /jlr(u,v ) 


' 

• 

r*\ 

0] 

•  • 

(e)  (f)  (g)  (h) 


Local  standard  deviation  <jr(u,v ) 


(i)  Q)  (k)  (1) 


Local  threshold  Q(u,v ) 


(m) 


(n) 


(°) 


(P) 


Binary  image 


ror  cti.ji'i  dif.w  Z*r»riu  (S-  Ict- 

(, .  b<*t  wri  Jif  dif"  C7’  Tfl 

but  jbfisTrirontji  f-rflr'.n.T.'  &  pr* 

■  -■ ■  J.-- -■  ( ■ . 1 1.- : ..f-.ify  i 1 


m  b+i«r*s  * fu  i/fptfcio  »rj- 


(q) 


(r) 


See  Sec.  11.3  and  the  online  code  for  additional  implementation  de¬ 
tails.  Note  that  the  filter  methods  implemented  in  RankFi Iters 
perform  replication  of  border  pixels  as  the  border  handling  strategy, 
as  discussed  earlier. 

Local  average  and  variance  with  Gaussian  kernels 

The  purpose  of  taking  the  local  average  is  to  smooth  the  image  to 
obtain  an  estimate  of  the  varying  background  intensity.  In  case  of 
a  square  or  circular  region,  this  is  equivalent  to  convolving  the  im¬ 
age  with  a  box-  or  disk-shaped  kernel,  respectively.  Kernels  of  this 


0  12  3 


type,  however,  are  not  well  suited  for  image  smoothing,  because  they 
create  strong  ringing  and  truncating  effects,  as  demonstrated  in  Fig. 
11.15.  Moreover,  convolution  with  a  box-shaped  (rectangular)  ker¬ 
nel  is  a  non-isotropic  operation,  that  is,  the  results  are  orientation- 
dependent.  From  this  perspective  alone  it  seems  appropriate  to  con¬ 
sider  other  smoothing  kernels,  Gaussian  kernels  in  particular. 


11.2  Local  Adaptive 
Thresholding 


Fig.  11.14 

Calculating  local  statistics  at 
image  boundaries.  The  illus¬ 
tration  shows  a  disk-shaped 
support  region  with  radius  r, 
placed  at  the  image  border. 
Pixel  values  outside  the  image 
can  be  replaced  (“filled-in”) 
by  the  closest  border  pixel, 
as  is  common  in  many  filter 
operations.  Alternatively,  the 
calculation  of  the  local  statis¬ 
tics  can  be  confined  to  include 
only  those  pixels  inside  the  im¬ 
age  that  are  actually  covered 
by  the  support  region.  At  any 
border  pixel,  the  number  of 
covered  elements  ( N )  is  still 
more  than  ~  1/4  of  the  full 
region  size.  In  this  particular 
case,  the  circular  region  covers 
a  maximum  of  N  =  69  pix¬ 
els  when  fully  embedded  and 
N  =  22  when  positioned  at  an 
image  corner. 


Box 


Disk 


Gaussian 


Fig.  11.15 

Local  average  (a— c)  and  vari¬ 
ance  (d— f)  obtained  with  differ¬ 
ent  smoothing  kernels.  31  X  31 
box  filter  (a,  d),  disk  filter  with 
radius  r  —  15  (b,e),  Gaussian 
kernel  with  a  =  0.6  •  15  =  9.0 
(c,f).  Both  the  box  and  disk 
filter  show  strong  truncation 
effects  (ringing),  the  box  filter 
is  also  highly  non-isotropic.  All 
images  are  contrast-enhanced 
for  better  visibility. 


Using  a  Gaussian  kernel  HG  for  smoothing  is  equivalent  to  cal¬ 
culating  a  weighted  average  of  the  corresponding  image  pixels,  with 
the  weights  being  the  coefficients  of  the  kernel.  Thus  calculating  this 
weighted  local  average  can  be  expressed  by 


Mg  (u,v) 


1 

EHG 


(■ T*HG)(u,v ), 


(11.76) 


where  EHG  is  the  sum  of  the  coefficients  in  the  kernel  HG  and  * 
denotes  the  linear  convolution  operator.12  Analogously,  there  is  also 


12 


See  Chapter  5,  Sec.  5.3.1. 
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a  weighted  variance  a G  which  can  be  calculated  jointly  with  the  local 
average  nG  (as  in  Eqn.  (11.74))  in  the  form 


1 

A* g(u,v)  =  shG  -Ag(u,v), 

(11.77) 

1  1 

<Jg{u,v)  =  ehG  •  (Bg(u,v)  shG  ■  Aq(u, 

0).  (11-78) 

with  Ag=I*Hg  and  BG  =  I2  *  H° . 

Thus  all  we  need  is  two  filter  operations,  one  applied  to  the 
original  image  (I  *  HG )  and  another  applied  to  the  squared  image 
(I2  *  i7G),  using  the  same  2D  Gaussian  kernel  HG  (or  any  other 
suitable  smoothing  kernel).  If  the  kernel  HG  is  normalized  (i.e., 
BHg  =  1),  Eqns.  (11.77)— (11.78)  reduce  to 

MgUO  =  Ag(u,v), 

(11.79) 

<j2g{u,v)  =  Bg(u,  v)  -  A2g(u,v), 

(11.80) 

with  Aq,Bg  as  defined  already. 

This  suggests  a  very  simple  process  for  computing  the  local  aver¬ 
age  and  variance  by  Gaussian  filtering,  as  summarized  in  Alg.  11.9. 
The  width  (standard  deviation  a)  of  the  Gaussian  kernel  is  set  to  0.6 
times  the  radius  r  of  the  corresponding  disk  filter  to  produce  a  sim¬ 
ilar  effect  as  Alg.  11.8.  The  Gaussian  approach  has  two  advantages: 
First,  the  Gaussian  makes  a  much  superior  low-pass  filter,  compared 
to  the  box  or  disk  kernels.  Second,  the  2D  Gaussian  is  (unlike  the 
circular  disk  kernel)  separable  in  the  x-  and  ^/-direction,  which  per¬ 
mits  a  very  efficient  implementation  of  the  2D  filter  using  only  a  pair 
of  ID  convolutions  (see  Ch.  5,  Sec.  5.2). 

For  practical  calculation,  AG,  BG  can  be  represented  as  (floating¬ 
point)  images,  and  most  modern  image-processing  environments 
provide  efficient  (multi-scale)  implementations  of  Gaussian  filters 
with  large-size  kernels.  In  ImageJ,  fast  Gaussian  filtering  is  imple¬ 
mented  by  the  class  GaussianBlur  with  the  public  methods  blur  () , 
blurGaussianO ,  and  blurFloatO,  which  all  use  normalized  filter 
kernels  by  default.  Programs  1 1.2-1  F 3  show  the  complete  ImageJ 
implementation  of  Niblack’s  thresholder  using  Gaussian  smoothing 
kernels. 


11.3  Java  Implementation 

All  thresholding  methods  described  in  this  chapter  have  been  imple¬ 
mented  as  part  of  the  imagingbook  library  that  is  available  with 
full  source  code  at  the  book’s  website.  The  top  class  in  this  li¬ 
brary13  is  Thresholder  with  the  sub-classes  GlobalThresholder  and 
Adapt iveThresholder  for  the  methods  described  in  Secs.  11.1  and 
11.2,  respectively.  Class  Thresholder  itself  is  abstract  and  only  de¬ 
fines  a  set  of  (non-public)  utility  methods  for  histogram  analysis. 
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13 


Package  imagingbook . pub . threshold. 


1:  AdaptiveThresholdGauss(/,  r,  ft,  d ,  bg ) 

Input:  /,  intensity  image  of  size  M  x  TV;  r,  support  region  ra¬ 
dius;  ft,  variance  control  parameter;  d,  minimum  offset;  bg  £ 
{dark,  bright},  background  type. 

Returns  a  map  Q  of  local  thresholds  for  the  grayscale  image  /. 


2 

3 

4 

5 

6 
7 


(M,  iV)  <-  Size(J) 

Create  maps  A,  B,  Q  :  M  x  iV^l 
for  all  image  coordinates  (ft,  ft)  £  M  x  N  do 
A(ft,  v )  x—  /(ft,  n) 

B(ft,  ft)  x—  (/(ft,  ft))2 
Hg  <-  MakeGaussianKernel2D(0.6  •  r) 


8: 

A  <-  A  *  H° 

p 

>  filter  the  original  image  with  H 

9: 

B  <r-  B  *  H° 

>  filter  the  squared  image  with  Z/G 

10: 

for  all  image 

coordinates  (ft,  ft)  £  M  x  N  do 

11: 

/Jjq  x —  A  (ft 

t) 

D>  Eq.  11.79 

12: 

<-  \/B(w,  v)  -  A2(u,  v) 

>  Eq.  11.80 

13: 

Q (ft,  ft) 

/ Mg  +  (ft  •  ctq  +  d)  if  bg  =  dark 

i  /  \ 

>  Eq.  11.72 

[Mg  ~  ( ft  •  aG  +  d )  if  bg  =  bright 

14: 

return  Q 

15: 


16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

26 


MakeGaussianKernel2D(cr) 

Returns  a  discrete  2D  Gaussian  kernel  H  with  std.  deviation  <r, 
sized  sufficiently  large  to  avoid  truncation  effects, 
r  X—  max(l,  [3.5  •  <r~|)  >  size  the  kernel  sufficiently  large 

Create  map  H  :  [— r,  r]2  i— x  M 
s  x—  0 

for  x  i - r, . . . ,  r  do 

for  y  i - r, . . . ,  r  do 


2,2 
_  x  +y 

H(x,y)  <—  e  2'ct  >  unnormalized  2D  Gaussian 

s  <-  s  +  H(x,y) 

for  x  < - r, . . . ,  r  do 

for  y  i - r, . . . ,  r  do 


H(x,y)  <—  j  -  H(x,y) 

return  H 


>  normalize  H 


11.3  Java 
Implementation 

Alg.  11.9 

Adaptive  thresholding  using 
Gaussian  averaging  (extended 
from  Alg.  11.8).  Parame¬ 
ters  are  the  original  image 
/,  the  radius  r  of  the  Gaus¬ 
sian  kernel,  variance  control 
k,  and  minimum  offset  d. 

The  argument  to  bg  should 
be  dark  if  the  image  back¬ 
ground  is  darker  than  the 
structures  of  interest,  bright 
if  the  background  is  brighter 
than  the  objects.  The  proce¬ 
dure  MakeGaussianKernel2D(cr) 
creates  a  discrete,  normalized 
2D  Gaussian  kernel  with  stan¬ 
dard  deviation  a. 


11.3.1  Global  Thresholding  Methods 

The  thresholding  methods  covered  in  Sec.  11.1  are  implemented  by 
the  following  classes: 

•  MeanThresholder,  MedianThresholder  (Sec.  11.1.2), 

•  QuantileThresholder  (Alg.  11.1), 

•  IsodataThresholder  (Alg.  11.2-11.3), 

•  OtsuThresholder  (Alg.  11.4), 

•  MaxEntropyThresholder  (Alg.  11.5),  and 

•  MinErrorThresholder  (Alg.  11.6). 

These  are  sub-classes  of  the  (abstract)  class  GlobalThresholder. 
The  following  example  demonstrates  the  typical  use  of  this  method 
for  a  given  ByteProcessor  object  I: 


GlobalThresholder  thr  =  new  IsodataThresholder () ; 
int  q  =  thr . getThreshold (I) ; 
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Prog.  11.2 

Niblack’s  thresholder  using 
Gaussian  smoothing  ker¬ 
nels  (ImageJ  implementa¬ 
tion  of  Alg.  11.9,  part  1). 


1 

o 

package  threshold; 

3 

4 

5 

6 

7 

8 

9 

import  ij . plugin . filter . GaussianBlur ; 

import  ij . plugin .filter . RankFilters ; 

import  ij . process . ByteProcessor ; 

import  i j . process . FloatProcessor ; 

import  imagingbook . pub . thr e shold . Backgr oundMode ; 

public  abstract  class  NiblackThresholder  extends 

Adapt iveThresholder  { 

10 

11 

//  parameters  for  this  thresholder 

12 

public  static  class  Parameters  { 

13 

public  int  radius  =  15; 

14 

public  double  kappa  =  0.30; 

15 

public  int  dMin  =  5; 

16 

public  Backgr oundMode  bgMode  =  Backgr oundMode 

. DARK ; 

17 

} 

18 

19 

private  final  Parameters  params;  // parameter  object 

20 

21 

protected  FloatProcessor  Imean;  //  —  jiG(u,v) 

22 

protected  FloatProcessor  Isigma;  II  =  ctq(u,v) 

23 

24 

public  ByteProcessor  getThreshold (ByteProcessor  I)  { 

25 

int  w  =  I .  getWidthO  ; 

26 

int  h  =  I . get Height () ; 

27 

28 

makeMeanAndVariance (I,  params) ; 

29 

ByteProcessor  Q  =  new  ByteProcessor (w,  h) ; 

30 

31 

final  double  kappa  =  params . kappa ; 

32 

final  int  dMin  =  params. dMin; 

33 

final  boolean  darkBg  = 

34 

(params . bgMode  ==  Backgr oundMode . DARK) ; 

35 

36 

for  (int  v  =  0;  v  <  h;  v++)  { 

37 

for  (int  u  =  0;  u  <  w;  u++)  { 

38 

double  sigma  =  Isigma. getf (u,  v) ; 

39 

double  mu  =  Imean. getf (u,  v) ; 

40 

double  diff  =  kappa  *  sigma  +  dMin; 

41 

int  q  =  (int) 

42 

Math . r int ( (darkBg)  ?  mu  +  diff  :  mu  - 

diff); 

43 

if  (q  <  0)  q  =  0; 

44 

if  (q  >  255)  q  =  255; 

45 

Q . set (u,  v,  q) ; 

46 

} 

47 

} 

48 

return  Q; 

49 

} 

50 

51 

//  continues  in  Prog.  11.3 
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52  //  continued  from  Prog.  1 1 .2 

53 

54  public  static  class  Gauss  extends  NiblackThresholder  { 

55 

56  protected  void  makeMeanAndVariance (ByteProcessor  I, 
Parameters  params)  { 

57  int  width  =  I . getWidth () ; 

58  int  height  =  I . get Height () ; 

59 

60  Imean  =  new  FloatProcessor (width, height ) ; 

61  Isigma  =  new  FloatProcessor (width , height ) ; 

62 

63  FloatProcessor  A  =  I .  convertToFloatProcessor  ()  ;  II  —  I 

64  FloatProcessor  B  =  I .  convertToFloatProcessor  ()  ;  II  —  I 

65  B .  sqr  ()  ;  //  =  I2 

66 

67  GaussianBlur  gb  =  new  GaussianBlur () ; 

68  double  sigma  =  params . radius  *  0.6; 

69  gb .blurFloat  (A,  sigma,  sigma,  0.002);  II  =  A 

70  gb  .blurFloat  (B,  sigma,  sigma,  0.002);  II—  B 

71 

72  for  (int  v  =  0;  v  <  height;  v++)  { 

73  for  (int  u  =  0;  u  <  width;  u++)  { 

74  float  a  =  A.getf(u,  v) ; 

75  float  b  =  B.getf(u,  v) ; 

76  float  sigmaG  = 

77  (float)  Math .  sqrt  (b  -  a*a)  ;  // Eq.  11.80 

78  Imean.  setf  (u,  v,  a);  II  =  fiG(u,v) 

79  Isigma.  setf  (u,  v,  sigmaG);  II  =  crG(u,v) 

80  } 

81  } 

82  } 

83  }  //  end  of  inner  class  NiblackThresholder  .Gauss 

84  }  //  end  of  class  NiblackThresholder 


11.3  Java 
Implementation 

Prog.  11.3 

Niblack’s  thresholder  using 
Gaussian  smoothing  kernels 
(part  2).  The  floating-point 
images  AG  and  BG  correspond 
to  the  maps  AG  (filtered  orig¬ 
inal  image)  and  BG  (filtered 
squared  image)  in  Alg.  11.9. 

An  instance  of  the  ImageJ 
class  GaussianBlur  is  created  in 
line  67  and  subsequently  used 
to  filter  both  images  in  lines 
69—70.  The  last  argument  to 
the  ImageJ  method  blurFloat 
(0.002)  specifies  the  accuracy 
of  the  Gaussian  kernel. 


if  (q  >  0)  I .threshold (q) ; 
else  .  .  . 

Here  threshold ()  is  the  built-in  ImageJ’s  method  defined  by  class 
ImageProcessor. 

11.3.2  Adaptive  Thresholding 

The  techniques  described  in  Sec.  11.2  are  implemented  by  the  follow¬ 
ing  classes: 

•  BernsenThresholder  (Alg.  11.7), 

•  NiblackThresholder  (Alg.  11.8,  multiple  versions),  and 

•  SauvolaThresholder  (Eqn.  (11.73)). 

These  are  sub-classes  of  the  (abstract)  class  Adapt iveThresholder. 
The  following  example  demonstrates  the  typical  use  of  these  methods 
for  a  given  ByteProcessor  object  I: 


Adapt iveThresholder  thr  =  new  BernsenThresholder () ; 
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11  Automatic 
Thresholding 


ByteProcessor  Q  =  thr .getThreshold(I) ; 
thr . threshold (I ,  Q) ; 


The  2D  threshold  surface  is  represented  by  the  image  Q;  the  method 
threshold(I,  Q)  is  defined  by  class  AdaptiveThresholder.  Alter¬ 
natively,  the  same  operation  can  be  performed  without  making  Q 
explicit,  as  demonstrated  by  the  following  code  segment: 

•  •  • 

//  Create  and  set  up  a  parameter  object: 

Parameters  params  =  new  BernsenThresholder . Parameters () ; 
params . radius  =  15; 
params . cmin  =  15; 

params . bgMode  =  BackgroundMode . DARK ; 

//  Create  the  thresholder: 

AdaptiveThresholder  thr  =  new  BernsenThresholder (params) ; 

//  Perform  the  threshold  operation: 
thr . threshold (I) ; 

•  •  • 

This  example  also  shows  how  to  specify  a  parameter  object  (params) 
for  the  instantiation  of  the  thresholder. 


1 1 .4  Summary  and  Further  Reading 

The  intention  of  this  chapter  was  to  give  an  overview  of  established 
methods  for  automatic  image  thresholding.  A  vast  body  of  relevant 
literature  exists,  and  thus  only  a  fraction  of  the  proposed  techniques 
could  be  discussed  here.  For  additional  approaches  and  references, 
several  excellent  surveys  are  available,  including  [86,  178,  204,  231] 
and  [213]. 

Given  the  obvious  limitations  of  global  techniques,  adaptive  thresh¬ 
olding  methods  have  received  continued  interest  and  are  still  a  focus 
of  ongoing  research.  Another  popular  approach  is  to  calculate  an 
adaptive  threshold  through  image  decomposition.  In  this  case,  the 
image  is  partitioned  into  (possibly  overlapping)  tiles,  an  “optimal” 
threshold  is  calculated  for  each  tile  and  the  adaptive  threshold  is 
obtained  by  interpolation  between  adjacent  tiles.  Another  inter¬ 
esting  idea,  proposed  in  [260],  is  to  specify  a  “threshold  surface”  by 
sampling  the  image  at  specific  points  that  exhibit  a  high  gradient, 
with  the  assumption  that  these  points  are  at  transitions  between  the 
background  and  the  foreground.  From  these  irregularly  spaced  point 
samples,  a  smooth  surface  is  interpolated  that  passes  through  the 
sample  points.  Interpolation  between  these  irregularly  spaced  point 
samples  is  done  by  solving  a  Laplacian  difference  equation  to  obtain 
a  continuous  “potential  surface”.  This  is  accomplished  with  the  so- 
called  “successive  over-relaxation”  method,  which  requires  about  N 
scans  over  an  image  of  size  TV  x  TV  to  converge,  so  its  time  complex¬ 
ity  is  an  expensive  0(N 3).  A  more  efficient  approach  was  proposed 
in  [26],  which  uses  a  hierarchical,  multi-scale  algorithm  for  interpo¬ 
lating  the  threshold  surface.  Similarly,  a  quad-tree  representation 
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was  used  for  this  purpose  in  [49].  Another  interesting  concept  is  ^1  5  Exercises 
“kriging”  [175],  which  was  originally  developed  for  interpolating  2D 
geological  data  [190,  Ch.  3,  Sec.  3.7.4]. 

In  the  case  of  color  images,  simple  thresholding  is  often  applied 
individually  to  each  color  channel  and  the  results  are  subsequently 
merged  using  a  suitable  logical  operation.  Transformation  to  a  non- 
RGB  color  space  (such  as  HSV  or  CIELAB)  might  be  helpful  for 
this  purpose.  For  a  binarization  method  aimed  specifically  at  vector¬ 
valued  images,  see  [159],  for  example.  Since  thresholding  can  be 
viewed  as  a  specific  form  of  segmentation,  color  segmentation  meth¬ 
ods  [50,53,85,216]  are  also  relevant  for  binarizing  color  images. 


11.5  Exercises 

Exercise  11.1.  Define  a  procedure  for  estimating  the  minimum  and 
maximum  pixel  value  of  an  image  from  its  histogram.  Threshold 
the  image  at  the  resulting  mid-range  value  (see  Eqn.  (11.12)).  Can 
anything  be  said  about  the  size  of  the  resulting  partitions? 

Exercise  11.2.  Define  a  procedure  for  estimating  the  median  of  an 
image  from  its  histogram.  Threshold  the  image  at  the  resulting  me¬ 
dian  value  (see  Eqn.  (11.11))  and  verify  that  the  foreground  and  back¬ 
ground  partitions  are  of  approximately  equal  size. 

Exercise  11.3.  The  algorithms  described  in  this  chapter  assume  8- 
bit  grayscale  input  images  (of  type  ByteProcessor  in  Image J).  Adopt 
the  current  implementations  to  work  with  16-bit  integer  image  (of 
type  ShortProcessor).  Images  of  this  type  may  contain  pixel  values 
in  the  range  [0,  216  —  1]  and  the  getHistogramO  method  returns  the 
histogram  as  an  integer  array  of  length  65536. 

Exercise  11.4.  Implement  simple  thresholding  for  RGB  color  im¬ 
ages  by  thresholding  each  (scalar-valued)  color  channel  individually 
and  then  merging  the  results  by  performing  a  pixel-wise  AND  op¬ 
eration.  Compare  the  results  to  those  obtained  by  thresholding  the 
corresponding  grayscale  (luminance)  images. 

Exercise  11.5.  Re-implement  the  Bernsen  and/or  Niblack  thres- 
holder  (classes  BernsenThresholder  and  NiblackThresholder)  us¬ 
ing  integral  images  (see  Ch.  3,  Sec.  3.8)  for  efficiently  calculating  the 
required  local  mean  and  variance  of  the  input  image  over  a  rectan¬ 
gular  support  region  R. 
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12 


Color  Images 


Color  images  are  involved  in  every  aspect  of  our  lives,  where  they  play 
an  important  role  in  everyday  activities  such  as  television,  photogra¬ 
phy,  and  printing.  Color  perception  is  a  fascinating  and  complicated 
phenomenon  that  has  occupied  the  interests  of  scientists,  psycholo¬ 
gists,  philosophers,  and  artists  for  hundreds  of  years  [211,217].  In 
this  chapter,  we  focus  on  those  technical  aspects  of  color  that  are 
most  important  for  working  with  digital  color  images.  Our  empha¬ 
sis  will  be  on  understanding  the  various  representations  of  color  and 
correctly  utilizing  them  when  programming.  Additional  color-related 
issues,  such  as  colorimetric  color  spaces,  color  quantization,  and  color 
filters,  are  covered  in  subsequent  chapters. 


12.1  RGB  Color  Images 

The  RGB  color  schema  encodes  colors  as  combinations  of  the  three 
primary  colors:  red,  green,  and  blue  (R,  G,B).  This  scheme  is  widely 
used  for  transmission,  representation,  and  storage  of  color  images  on 
both  analog  devices  such  as  television  sets  and  digital  devices  such 
as  computers,  digital  cameras,  and  scanners.  For  this  reason,  many 
image-processing  and  graphics  programs  use  the  RGB  schema  as  their 
internal  representation  for  color  images,  and  most  language  libraries, 
including  Java’s  imaging  APIs,  use  it  as  their  standard  image  repre¬ 
sentation. 

RGB  is  an  additive  color  system,  which  means  that  all  colors  start 
with  black  and  are  created  by  adding  the  primary  colors.  You  can 
think  of  color  formation  in  this  system  as  occurring  in  a  dark  room 
where  you  can  overlay  three  beams  of  light — one  red,  one  green,  and 
one  blue — on  a  sheet  of  white  paper.  To  create  different  colors,  you 
would  modify  the  intensity  of  each  of  these  beams  independently. 

The  distinct  intensity  of  each  primary  color  beam  controls  the  shade 
and  brightness  of  the  resulting  color.  The  colors  gray  and  white  are 
created  by  mixing  the  three  primary  color  beams  at  the  same  inten¬ 
sity.  A  similar  operation  occurs  on  the  screen  of  a  color  television  or 
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12  Color  Images 


Fig.  12.1 

Representation  of  the  RGB 
color  space  as  a  3D  unit  cube. 

The  primary  colors  red  ( R ), 
green  (G),  and  blue  ( B )  form 
the  coordinate  system.  The 
“pure”  red  color  (R),  green 
(G),  blue  (B),  cyan  (C),  ma¬ 
genta  (M),  and  yellow  (Y) 
lie  on  the  vertices  of  the 
color  cube.  All  the  shades 
of  gray,  of  which  K  is  an  ex¬ 
ample,  lie  on  the  diagonal 
between  black  S  and  white  W. 
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CRT1-based  computer  monitor,  where  tiny,  close-lying  dots  of  red, 
green,  and  blue  phosphorous  are  simultaneously  excited  by  a  stream 
of  electrons  to  distinct  energy  levels  (intensities),  creating  a  seemingly 
continuous  color  image. 

The  RGB  color  space  can  be  visualized  as  a  3D  unit  cube  in  which 
the  three  primary  colors  form  the  coordinate  axis.  The  RGB  values 
are  positive  and  he  in  the  range  [0,  Cmax];  for  most  digital  images, 
Cmax  =  255.  Every  possible  color  Ci  corresponds  to  a  point  within 
the  RGB  color  cube  of  the  form 


(Ri,  Gi ,  Bj) 


•> 


where  0  <  Cmax-  RGB  values  are  often  normalized  to 

the  interval  [0, 1]  so  that  the  resulting  color  space  forms  a  unit  cube 
(Fig.  12.1).  The  point  S  =  (0,0,0)  corresponds  to  the  color  black, 
W  =  (1,1,1)  corresponds  to  the  color  white,  and  all  the  points  lying 
on  the  diagonal  between  S  and  W  are  shades  of  gray  created  from 
equal  color  components  R  =  G  =  B. 

Figure  12.2  shows  a  color  test  image  and  its  corresponding  RGB 
color  components,  displayed  here  as  intensity  images.  We  will  refer 
to  this  image  in  a  number  of  examples  that  follow  in  this  chapter. 

RGB  is  a  very  simple  color  system,  and  as  demonstrated  in  Sec. 
12.2,  a  basic  knowledge  of  it  is  often  sufficient  for  processing  color 
images  or  transforming  them  into  other  color  spaces.  At  this  point, 
we  will  not  be  able  to  determine  what  color  a  particular  RGB  pixel 
corresponds  to  in  the  real  world,  or  even  what  the  primary  colors  red, 
green,  and  blue  truly  mean  in  a  physical  (i.e.,  colorimetric)  sense.  For 
now  we  rely  on  our  intuitive  understanding  of  color  and  will  address 
colorimetry  and  color  spaces  later  in  the  context  of  the  CIE  color 
system  (see  Ch.  14). 


12.1.1  Structure  of  Color  Images 

Color  images  are  represented  in  the  same  way  as  grayscale  images,  by 
using  an  array  of  pixels  in  which  different  models  are  used  to  order  the 
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1  Cathode  ray  tube. 


12.1  RGB  Color  Images 


RGB 


Fig.  12.2 

A  color  image  and  its  corre¬ 
sponding  RGB  channels.  The 
fruits  depicted  are  mainly  yel¬ 
low  and  red  and  therefore  have 
high  values  in  the  R  and  G 
channels.  In  these  regions,  the 
B  content  is  correspondingly 
lower  (represented  here  by 
darker  gray  values)  except  for 
the  bright  highlights  on  the 
apple,  where  the  color  changes 
gradually  to  white.  The  table- 
top  in  the  foreground  is  purple 
and  therefore  displays  corre¬ 
spondingly  higher  values  in  its 
B  channel. 


RGB 


individual  color  components.  In  the  next  sections  we  will  examine  the 
difference  between  true  color  images,  which  utilize  colors  uniformly 
selected  from  the  entire  color  space,  and  so-called  palleted  or  indexed 
images,  in  which  only  a  select  set  of  distinct  colors  are  used.  Deciding 
which  type  of  image  to  use  depends  on  the  requirements  of  the  appli¬ 
cation.  Farbbilder  werden  iiblicherweise,  genau  wie  Grauwertbilder, 
als  Arrays  von  Pixeln  dargestellt,  wobei  unterschiedliche  Modelle  fur 
die  Anordnung  der  einzelnen  Farbkomponenten  verwendet  werden. 
Zunachst  ist  zu  unterscheiden  zwischen  Vollfarbenbildern ,  die  den 
gesamten  Farbraum  gleichformig  abdecken  konnen,  und  so  genan- 
nten  Paletten-  oder  Indexbildern ,  die  nur  eine  beschrankte  Zahl  un- 
terschiedlicher  Farben  verwenden.  Beide  Bildtypen  werden  in  der 
Praxis  haufig  eingesetzt. 

True  color  images 

A  pixel  in  a  true  color  image  can  represent  any  color  in  its  color 
space,  as  long  as  it  falls  within  the  (discrete)  range  of  its  individual 
color  components.  True  color  images  are  appropriate  when  the  im¬ 
age  contains  many  colors  with  subtle  differences,  as  occurs  in  digital 
photography  and  photo-realistic  computer  graphics.  Next  we  look  at 
two  methods  of  ordering  the  color  components  in  true  color  images: 
component  ordering  and  packed  ordering. 
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12  Color  Images 


3  component  arrays 


Fig.  12.3 

RGB  color  image  in  com¬ 
ponent  ordering.  The  three 
color  components  are  laid 
out  in  separate  arrays  IR , 
Iq  ,  I B  of  the  same  size. 
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Component  ordering 

In  component  ordering  (also  referred  to  as  planar  ordering )  the  color 
components  are  laid  out  in  separate  arrays  of  identical  dimensions. 
In  this  case,  the  color  image 

^comp  =  Urj-^Gj^b)  (12.1) 

can  be  thought  of  as  a  vector  of  related  intensity  images  IR,  IG,  and 
-Tb  (Fig.  12.3),  and  the  RGB  values  of  the  color  image  I  at  position 
(it,  v)  are  obtained  by  accessing  the  three  component  images  in  the 
form 


/ R(u,  v)\  flR(u,v)\ 

G(u,  v)  =  IG(u,v)  .  (12.2) 

\B(u,v)J  \IB(u,v)J 

Packed  ordering 

In  packed  ordering ,  the  component  values  that  represent  the  color  of 
a  particular  pixel  are  packed  together  into  a  single  element  of  the 
image  array  (Fig.  12.4)  such  that 

Ip ackK  v)  =  (R,  G,  B).  (12.3) 

The  RGB  value  of  a  packed  image  I  at  the  location  (it,  v)  is  obtained 
by  accessing  the  individual  components  of  the  color  pixel  as 

/ R(u,v)\  (  Red (Ipack(u,v))\ 

I  G(u,v)  =  I  Green(/pack(u,  v))  .  (12.4) 

\B(u,v)J  \  Blu e(/pack(u,u))/ 

The  access  functions  Red(),  Green(),  Blue(),  will  depend  on  the  spe¬ 
cific  implementation  used  for  encoding  the  color  pixels. 

Indexed  images 

Indexed  images  permit  only  a  limited  number  of  distinct  colors  and 
therefore  are  used  mostly  for  illustrations  and  graphics  that  contain 
large  regions  of  the  same  color.  Often  these  types  of  images  are  stored 
in  indexed  GIF  or  PNG  files  for  use  on  the  Web.  In  these  indexed 
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12.1  RGB  Color  Images 
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Fig.  12.4 

RGB-color  image  using  packed 
ordering.  The  three  color  com¬ 
ponents  R ,  G ,  and  B  are 
placed  together  in  a  single 
array  element. 
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images,  the  pixel  array  does  not  contain  color  or  brightness  data  but 
instead  consists  of  integer  numbers  k  that  are  used  to  index  into  a 
color  table  or  “palette” 

P  =  (Pr)Pg,Pg)  :  [0,Q-1]3  ^  [0,K-1].  (12.5) 

Here  Q  denotes  the  size  of  the  color  table,  equal  to  the  maximum 
number  of  distinct  image  colors  (typically  Q  =  2, . . . ,  256).  K  is  the 
number  of  distinct  component  values  (typ.  K  =  256).  This  table 
contains  a  specific  color  vector  P (q)  =  (Rq,  Gq,  Bq)  for  every  color 
index  q  —  0, . . . ,  Q— 1  (see  Fig.  12.5).  The  RGB  component  values  of 
an  indexed  image  /idx  at  position  (r,  v)  are  obtained  as 

/R(u,v)\  /Rq\  (P  r(q)\ 

GM  =  G,  =  Pg(g)  ,  (12.6) 

\B(u,v)J  \bJ  \Ph(q)J 

with  the  index  q  =  /idx(R,'c).  To  allow  proper  reconstruction,  the 
color  table  P  must  of  course  be  stored  and/or  transmitted  along  with 
the  indexed  image. 
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Fig.  12.5 

RGB  indexed  image.  The  im¬ 
age  array  /idx  itself  does  not 
contain  any  color  component 
values.  Instead,  each  cell  con¬ 
tains  an  index  q  G  [0,  Q  —  1]. 
into  the  associated  color  table 
(“palette”)  P.  The  actual  color 
value  is  specified  by  the  table 
entry  Pq  =  (Rq,Gq,  Bq). 


During  the  transformation  from  a  true  color  image  to  an  indexed 
image  (e.g.,  from  a  JPEG  image  to  a  GIF  image),  the  problem  of 
optimal  color  reduction,  or  color  quantization ,  arises.  Color  quanti¬ 
zation  is  the  process  of  determining  an  optimal  color  table  and  then 
mapping  it  to  the  original  colors.  This  process  is  described  in  detail 
in  Chapter  13. 
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12  Color  Images 


Fig.  12.6 

Structure  of  a  packed  RGB 
color  pixel  in  Java.  Within  a 
32-bit  int,  8  bits  are  allocated, 
in  the  following  order,  for  each 
of  the  color  components  R, 
G ,  B ,  and  the  transparency 
value  a.  (unused  in  Image J). 


12.1.2  Color  Images  in  Image  J 

ImageJ  provides  two  simple  types  of  color  images: 

•  RGB  full-color  images  (24-bit  “RGB  color”). 

•  Indexed  images  (“8-bit  color”). 

RGB  true  color  images 

RGB  color  images  in  ImageJ  use  a  packed  order  (see  Sec.  12.1.1), 
where  each  color  pixel  is  represented  by  a  32-bit  int  value.  As  Fig. 
12.6  illustrates,  8  bits  are  used  to  represent  each  of  the  RGB  compo¬ 
nents,  which  limits  the  range  of  the  individual  components  to  0-255. 
The  remaining  8  bits  are  reserved  for  the  transparency,2  or  alpha  (ct), 
component.  This  is  also  the  usual  ordering  in  Java3  for  RGB  color 
images. 


a 


R 


G 


B 


31 


24  23 


16  15 
Bits 


8  7 


0 


Accessing  RGB  pixel  values 

RGB  color  images  are  represented  by  an  array  of  pixels,  the  elements 
of  which  are  standard  Java  ints.  To  disassemble  the  packed  int 
value  into  the  three  color  components,  you  apply  the  appropriate 
bitwise  shifting  and  masking  operations.  In  the  following  example, 
we  assume  that  the  image  processor  ip  (of  type  ColorProcessor) 
contains  an  RGB  color  image: 

int  c  =  ip.getPixel(u,v) ;  //  a  packed  RGB  color  pixel 

int  r  =  (c  &  Oxf  f  0000)  »  16;  //  red  component 

int  g  =  (c  &  OxOOffOO)  »  8;  //  green  component 

int  b  =  (c  &  OxOOOOf  f ) ;  //  blue  component 

In  this  example,  each  of  the  RGB  components  of  the  packed  pixel 
c  are  isolated  using  a  bitwise  AND  operation  (&)  with  an  appropriate 
bit  mask  (following  convention,  bit  masks  are  given  in  hexadecimal4 
notation),  and  afterwards  the  extracted  bits  are  shifted  right  by  16 
(for  R )  or  8  (for  G)  bit  positions  (see  Fig.  12.7). 

The  “assembly”  of  an  RGB  pixel  from  separate  R,  G,  and  B 
values  works  in  the  opposite  direction  using  the  bitwise  OR  operator 
( | )  and  shifting  the  bits  left  (<<): 

int  r  =  169;  //  red  component 

int  g  =  212;  //  green  component 

int  b  =  17 ;  //  blue  component 

int  c  =  ((r  &  Oxff)  «  16)  I  ( (g  &  Oxff)  «  8)  I  b  &  Oxf  f ; 
ip .putPixel (u,  v,  c) ; 

2  The  transparency  value  a  (alpha)  represents  the  ability  to  see  through 
a  color  pixel  onto  the  background.  At  this  time,  the  a  channel  is  unused 
in  ImageJ. 

3  Java  Advanced  Window  Toolkit  (AWT). 

4  The  mask  Oxff 0000  is  of  type  int  and  represents  the  32-bit  binary 
pattern  00000000111111110000000000000000. 
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1  //  File  Brighten_RGB_l .  java 

2  import  i j . ImagePlus ; 

3  import  ij .plugin. filter . PluglnFilter ; 

4  import  ij .process . ImageProcessor ; 

5 

6  public  class  Brighten_RGB_l  implements  PluglnFilter  { 

T 

public  int  setup (String  arg,  ImagePlus  imp)  { 
return  D0ES_RGB;  //this  plugin  works  on  RGB  images 

} 
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27 

28 

29 

30  } 


public  void  run (ImageProcessor  ip)  { 
int  []  pixels  =  (int[])  ip .  getPixels  () ; 

for  (int  i  =  0;  i  <  pixels . length;  i++)  { 
int  c  =  pixels [i] ; 

//  split  color  pixel  into  rgb-components: 
int  r  =  (c  &  OxffOOOO)  »  16; 
int  g  =  (c  &  OxOOffOO)  »  8; 
int  b  =  (c  &  OxOOOOff); 

//  modify  colors: 

r  =  r  +  10;  if  (r  >  255)  r  =  255; 

g  =  g  +  10;  if  (g  >  255)  g  =  255; 

b  =  b  +  10;  if  (b  >  255)  b  =  255; 

//  reassemble  color  pixel  and  insert  into  pixel  array: 

pixels  [i] 

=  ((r  &  Oxf f ) «16)  I  ((g  &  Oxff  )«8)  I  b  &  Oxf f ; 


} 


} 


Fig.  12.7 

Decomposition  of  a  32-bit 
RGB  color  pixel  using  bit  op¬ 
erations.  The  R  component 
(bits  16—23)  of  the  RGB  pix¬ 
els  C  (above)  is  isolated  using 
a  bitwise  AND  operation  (&) 
together  with  a  bit  mask  M  = 
OxffOOOO.  All  bits  except  the  R 
component  are  set  to  the  value 
0,  while  the  bit  pattern  within 
the  R  component  remains  un¬ 
changed.  This  bit  pattern  is 
subsequently  shifted  16  posi¬ 
tions  to  the  right  (>>),  so  that 
the  R  component  is  moved  into 
the  lowest  8  bits  and  its  value 
lies  in  the  range  of  0,  ...  ,  255. 
During  the  shift  operation, 
zeros  are  filled  in  from  the  left. 


Prog.  12.1 

Processing  RGB  color  data 
with  the  use  of  bit  operations 
(Image J  plugin,  version  1). 
This  plugin  increases  the  val¬ 
ues  of  all  three  color  compo¬ 
nents  by  10  units.  It  demon¬ 
strates  the  use  of  direct  access 
to  the  pixel  array  (line  16), 
the  separation  of  color  com¬ 
ponents  using  bit  operations 
(lines  18—20),  and  the  reassem¬ 
bly  of  color  pixels  after  mod¬ 
ification  (line  27).  The  value 
D0ES_RGB  (defined  in  the  inter¬ 
face  PluglnFilter)  returned  by 
the  setup  ()  method  indicates 
that  this  plugin  is  designed  to 
work  on  RGB  formatted  true 
color  images  (line  9). 


Masking  the  component  values  with  Oxff  works  in  this  case  because, 
except  for  the  bits  in  positions  0, . . . ,  7  (values  in  the  range  0-255), 
all  the  other  bits  are  already  set  to  zero.  A  complete  example  of 
manipulating  an  RGB  color  image  using  bit  operations  is  presented 
in  Prog.  12.1.  Instead  of  accessing  color  pixels  using  Image J’s  access 
functions,  these  programs  directly  access  the  pixel  array  for  increased 
efficiency 

The  ImageJ  class  ColorProcessor  provides  an  easy  to  use  alter¬ 
native  which  returns  the  separated  RGB  components  (as  an  int  array 
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Prog.  12.2 

Working  with  RGB  color  im¬ 
ages  without  bit  operations 
(Image J  plugin,  version  2). 
This  plugin  increases  the  val¬ 
ues  of  all  three  color  compo¬ 
nents  by  10  units  using  the 
access  methods  getPixel (int , 
int ,  int [] )  and  putPixel (int , 
int ,  int  []  )  from  the  class 
ColorProcessor  (lines  21  and 
25,  respectively).  Exection 
time  is  approximately  four 
times  higher  than  that  of  ver¬ 
sion  1  (Prog.  12.1)  because  of 
the  additional  method  calls. 


1  //  File  Brighten_RGB_2 .  j  ava 

2  import  i j . ImagePlus ; 

3  import  ij . plugin . filter . PluglnFilter ; 

4  import  ij . process . ColorProcessor ; 

5  import  ij . process . ImageProcessor ; 

6 

7  public  class  Brighten_RGB_2  implements  PluglnFilter  { 

8  static  final  int  R  =  0,  G  =  1,  B  =  2;  //  component  indices 

9 

10  public  int  setup (String  arg,  ImagePlus  imp)  { 

11  return  D0ES_RGB;  //  this  plugin  works  on  RGB  images 

12  } 

13 

14  public  void  run (ImageProcessor  ip)  { 

15  //  typecast  the  image  to  ColorProcessor  (no  duplication): 

16  ColorProcessor  cp  =  (ColorProcessor)  ip; 

17  int  []  RGB  =  new  int  [3]  ; 

18 

19  for  (int  v  =  0;  v  <  cp . getHeight () ;  v++)  { 

20  for  (int  u  =  0;  u  <  cp .getWidthO  ;  u++)  { 

21  cp . getPixel (u,  v,  RGB); 

22  RGB  [R]  =  Math,  min  (RGB  [R]  +  10,  255);  //addlOand 

23  RGB  [G]  =  Math,  min  (RGB  [G]  +  10,  255);  //  limit  to  255 

24  RGB  [B]  =  Math. min (RGB  [B]  +  10,  255); 

25  cp .putPixel (u,  v,  RGB); 

26  } 

27  } 

28  } 

29  } 


with  three  elements).  In  the  following  example,  which  demonstrates 
its  use,  ip  is  of  type  ColorProcessor: 

int  []  RGB  =  new  int  [3]  ; 

•  •  • 

ip.  getPixel  (u,  v,  RGB);  // modifies  RGB 
int  r  =  RGB  [0] ; 
int  g  =  RGB [1] ; 
int  b  =  RGB  [2] ; 

•  •  • 

ip .putPixel (u,  v,  RGB); 

A  more  detailed  and  complete  example  is  shown  by  the  simple  plugin 
in  Prog.  12.2,  which  increases  the  value  of  all  three  color  components 
of  an  RGB  image  by  10  units.  Notice  that  the  plugin  limits  the 
resulting  component  values  to  255,  because  the  putPixel  ()  method 
only  uses  the  lowest  8  bits  of  each  component  and  does  not  test  if 
the  value  passed  in  is  out  of  the  permitted  0-255  range.  Without 
this  test,  arithmetic  overflow  errors  can  occur.  The  price  for  using 
this  access  method,  instead  of  direct  array  access,  is  a  noticeably 
longer  running  time  (approximately  a  factor  of  4  when  compared  to 
the  version  in  Prog.  12.1). 
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Opening  and  saving  RGB  images 

ImageJ  supports  the  following  types  of  image  formats  for  RGB  true 
color  images: 

•  TIFF  (uncompressed  only):  3  x  8-bit  RGB.  TIFF  color  images 
with  16-bit  depth  are  opened  as  an  image  stack  consisting  of  three 
16-bit  intensity  images. 

•  BMP,  JPEG:  3  x  8-bit  RGB. 

•  PNG:  3  x  8-bit  RGB. 

•  RAW:  using  the  ImageJ  menu  File  >  Import  >  Raw,  RGB  images 
can  be  opened  whose  format  is  not  directly  supported  by  Im¬ 
ageJ.  It  is  then  possible  to  select  different  arrangements  of  the 
color  components. 

Creating  RGB  color  images 

The  simplest  way  to  create  a  new  RGB  image  using  ImageJ  is  to  use 
an  instance  of  the  class  ColorProcessor,  as  the  following  example 
demonstrates: 

int  w  =  640,  h  =  480; 

ColorProcessor  cp  =  new  ColorProcessor (w,  h) ; 

(new  ImagePlus ("My  New  Color  Image",  cp)).show(); 

When  needed,  the  color  image  can  be  displayed  by  creating  an  in¬ 
stance  of  the  class  ImagePlus  and  calling  its  show()  method.  Since 
cip  is  of  type  ColorProcessor,  the  resulting  ImagePlus  object  cimg 
is  also  a  color  image. 

Indexed  color  images 

The  structure  of  an  indexed  image  in  ImageJ  is  given  in  Fig.  12.5, 
where  each  element  of  the  index  array  is  8  bits  and  therefore  can 
represent  a  maximum  of  256  different  colors.  When  programming, 
indexed  images  are  similar  to  grayscale  images,  as  both  make  use 
of  a  color  table  to  determine  the  actual  color  of  the  pixel.  Indexed 
images  differ  from  grayscale  images  only  in  that  the  contents  of  the 
color  table  are  not  intensity  values  but  RGB  values. 

Opening  and  saving  indexed  images 

ImageJ  supports  the  indexed  images  in  GIF,  PNG,  BMP,  and  TIFF 
format  with  index  values  of  1-8  bits  (i.e.,  2-256  distinct  colors)  and 
3  x  8-bit  color  values. 

Processing  indexed  images 

The  indexed  format  is  mostly  used  as  a  space-saving  means  of  image 
storage  and  is  not  directly  useful  as  a  processing  format  since  an 
index  value  in  the  pixel  array  is  arbitrarily  related  to  the  actual 
color,  found  in  the  color  table,  that  it  represents.  When  working 
with  indexed  images  it  usually  makes  no  sense  to  base  any  numerical 
interpretations  on  the  pixel  values  or  to  apply  any  filter  operations 
designed  for  8-bit  intensity  images.  Figure  12.8  illustrates  an  example 
of  applying  a  Gaussian  filter  and  a  median  filter  to  the  pixels  of  an 
indexed  image.  Since  there  is  no  meaningful  quantitative  relation 
between  the  actual  colors  and  the  index  values,  the  results  are  erratic. 
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Fig.  12.8 

Improper  application  of 
smoothing  filters  to  an  in¬ 
dexed  color  image.  Indexed 
image  with  16  colors  (a)  and 
results  of  applying  a  linear 
smoothing  filter  (b)  and  a 
3x3  median  filter  (c)  to  the 
pixel  array  (that  is,  the  index 
values).  The  application  of  a 
linear  filter  makes  no  sense,  of 
course,  since  no  meaningful  re¬ 
lation  exists  between  the  index 
values  in  the  pixel  array  and 
the  actual  image  intensities. 
While  the  median  filter  (c) 
delivers  seemingly  plausible  re¬ 
sults  in  this  case,  its  use  is  also 
inadmissible  because  no  mean¬ 
ingful  ordering  relation  ex¬ 
ists  between  the  index  values. 


(a) 


(b) 


(c) 


Note  that  even  the  use  of  the  median  filter  is  inadmissible  because 
no  ordering  relation  exists  between  the  index  values.  Thus,  with  few 
exceptions,  ImageJ  functions  do  not  permit  the  application  of  such 
operations  to  indexed  images.  Generally,  when  processing  an  indexed 
image,  you  first  convert  it  into  a  true  color  RGB  image  and  then  after 
processing  convert  it  back  into  an  indexed  image. 

When  an  ImageJ  plugin  is  supposed  to  process  indexed  images, 
its  setup  ()  method  should  return  the  D0ES_8C  (“8-bit  color”)  flag. 
The  plugin  in  Prog.  12.3  shows  how  to  increase  the  intensity  of  the 
three  color  components  of  an  indexed  image  by  10  units  (analogously 
to  Progs.  12.1  and  12.2  for  RGB  images).  Notice  how  in  indexed 
images  only  the  palette  is  modified  and  the  original  pixel  data,  the 
index  values,  remain  the  same.  The  color  table  of  ImageProcessor 
is  accessible  through  a  ColorModel5  object,  which  can  be  read  using 
the  method  getColorModel  ()  and  modified  using  setColorModel  () . 

The  ColorModel  object  for  indexed  images  (as  well  as  8-bit 
grayscale  images)  is  a  subtype  of  IndexColorModel,  which  contains 
three  color  tables  (maps)  representing  the  red,  green,  and  blue  com¬ 
ponents  as  separate  byte  arrays.  The  size  of  these  tables  (2, . . . ,  256) 
can  be  determined  by  calling  the  method  getMapSize () .  Note  that 
the  elements  of  the  palette  should  be  interpreted  as  unsigned  bytes 
with  values  ranging  from  0, . . . ,  255.  Just  as  with  grayscale  pixel 
values,  during  the  conversion  to  int  values,  these  color  component 
values  must  also  be  bitwise  masked  with  Oxf  f  as  shown  in  Prog.  12.3 
(lines  30-32). 

As  a  further  example,  Prog.  12.4  shows  how  to  convert  an  indexed 
image  to  a  true  color  RGB  image  of  type  ColorProcessor.  Conver¬ 
sion  in  this  direction  poses  no  problems  because  the  RGB  component 
values  for  a  particular  pixel  are  simply  taken  from  the  corresponding 
color  table  entry,  as  described  by  Eqn.  (12.6).  On  the  other  hand, 


Defined  in  the  standard  Java  class  java,  awt .  image . ColorModel. 
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1  //  File  Brighten_Index_Image  .  java 

2 

3  import  i j . ImagePlus ; 

4  import  ij .plugin. filter . PluglnFilter ; 

5  import  ij .process . ImageProcessor ; 

6  import  j  ava . awt . image . IndexColorModel ; 

7 

8  public  class  Brighten_Index_Image  implements  PluglnFilter  { 

9 

10  public  int  setup (String  arg,  ImagePlus  imp)  { 

11  return  D0ES_8C ;  //  this  plugin  works  on  indexed  color  images 

12  } 

13 

14  public  void  run (ImageProcessor  ip)  { 

15  IndexColorModel  icm  = 

16  (IndexColorModel)  ip . getColorModel () ; 

17  int  pixBits  =  icm.getPixelSize () ; 

18  int  nColors  =  icm . getMapSize () ; 

19 

20  //retrieve  the  current  lookup  tables  (maps)  for  R ,  G ,  B\ 

21  byte  []  pRed  =  new  byte  [nColors]  ; 

22  byte  []  pGrn  =  new  byte  [nColors]  ; 

23  byte []  pBlu  =  new  byte  [nColors] ; 

24  icm. getReds (pRed) ; 

25  icm . getGreens (pGrn) ; 

26  icm. getBlues (pBlu) ; 

27 

28  //modify  the  lookup  tables: 

29  for  (int  idx  =  0;  idx  <  nColors;  idx++){ 

30  int  r  =  Oxff  &  pRed  [idx] ;  //  mask  to  treat  as  unsigned  byte 

31  int  g  =  Oxff  &  pGrn [idx]; 

32  int  b  =  Oxff  &  pBlu[idx] ; 

33  pRed[idx]  =  (byte)  Math.min(r  +  10,  255); 

34  pGrn[idx]  =  (byte)  Math.min(g  +  10,  255); 

35  pBlu[idx]  =  (byte)  Math.min(b  +  10,  255); 

36  } 

37  //create  a  new  color  model  and  apply  to  the  image: 

38  IndexColorModel  icm2  = 

39  new  IndexColorModel (pixBits , nColors , pRed, pGrn, pBlu) ; 

40  ip . setColorModel (icm2) ; 

41  } 

42  } 


Prog.  12.3 

Working  with  indexed  images 
(Image J  plugin).  This  plugin 
increases  the  brightness  of  an 
image  by  10  units  by  modi¬ 
fying  the  image’s  color  table 
(palette).  The  actual  values 
in  the  pixel  array,  which  are 
indices  into  the  palette,  are  not 
changed. 


conversion  in  the  other  direction  requires  quantization  of  the  RGB 
color  space  and  is  as  a  rule  more  difficult  and  involved  (see  Ch.  13 
for  details).  In  practice,  most  applications  make  use  of  existing  con¬ 
version  methods  such  as  those  provided  by  the  ImageJ  API. 

Creating  indexed  images 

In  ImageJ,  no  special  method  is  provided  for  the  creation  of  indexed 
images,  so  in  almost  all  cases  they  are  generated  by  converting  an 
existing  image.  The  following  method  demonstrates  how  to  directly 
create  an  indexed  image  if  required: 

ByteProcessor  makelndexColor Image (int  w,  int  h,  int  nColors)  { 
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Prog.  12.4 

Converting  an  indexed  im¬ 
age  to  a  true  color  RGB 
image  (Image J  plugin). 


1 

o 

//  File  Index_To_Rgb .  java 

-Z 

3 

4 

5 

6 

7 

8 

9 

import  i j . ImagePlus ; 

import  ij . plugin . filter . PluglnFilter ; 
import  i j . process . ColorProcessor ; 
import  i j . process . ImageProcessor ; 
import  j  ava . awt . image . IndexColorModel ; 

public  class  Index_To_Rgb  implements  PluglnFilter  { 

10 

static  final  int  R=0,  G=l,  B=2; 

11 

ImagePlus  imp ; 

12 

13 

public  int  setup (String  arg,  ImagePlus  imp) 

{ 

14 

this. imp  =  imp; 

15 

return  D0ES_8C  +  N0_CHANGES ;  //  does  not  alter  original  image 

16 

} 

17 

18 

public  void  run (ImageProcessor  ip)  { 

19 

int  w  =  ip  .getWidthO  ; 

20 

int  h  =  ip . get Height () ; 

21 

22 

//  retrieve  the  lookup  tables  (maps)  for  R ,  G ,  B\ 

23 

IndexColorModel  icm  = 

24 

( IndexColorModel )  ip . getColorModel ( ) ; 

25 

int  nColors  =  icm . getMapSize () ; 

26 

byte  []  pRed  =  new  byte [nColors] ; 

27 

byte  []  pGrn  =  new  byte [nColors] ; 

28 

byte  []  pBlu  =  new  byte [nColors] ; 

29 

icm . getReds (pRed) ; 

30 

icm . getGreens (pGrn) ; 

31 

icm . getBlues (pBlu) ; 

32 

33 

//  create  a  new  24-bit  RGB  image: 

34 

ColorProcessor  cp  =  new  ColorProcessor (w, 

h); 

35 

int  []  RGB  =  new  int  [3]  ; 

36 

for  (int  v  =  0;  v  <  h;  v++)  { 

37 

for  (int  u  =  0;  u  <  w;  u++)  { 

38 

int  idx  =  ip . getPixel (u,  v) ; 

39 

RGB  [R]  =  OxFF  &  pRed  [idx]  ; 

40 

RGB  [G]  =  OxFF  &  pGrn  [idx]  ; 

41 

RGB  [B]  =  OxFF  &  pBlu  [idx]  ; 

42 

cp .putPixel (u,  v,  RGB); 

43 

} 

44 

} 

45 

ImagePlus  cwin  = 

46 

new  ImagePlus (imp . get ShortTitle ()  +  " 

(RGB)",  cp); 

47 

cwin .  showO  ; 

48 

} 

49 

} 

byte  []  rMap  =  new  byte  [nColors]  ;  //  red,  green,  blue  color  maps 

byte  []  gMap  =  new  byte  [nColors]  ; 

byte  []  bMap  =  new  byte  [nColors]  ; 

//  color  maps  need  to  be  filled  here 

byte  []  pixels  =  new  byte  [w  *  h]  ; 
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IndexColorModel  cm 

=  new  IndexColorModel (8 ,  nColors ,  rMap,  gMap,  bMap) ; 
return  new  ByteProcessor (w,  h,  pixels,  cm); 

} 

The  parameter  nColors  defines  the  number  of  colors  (and  thus  the 
size  of  the  palette)  and  must  be  a  value  in  the  range  of  2, ... ,  256.  To 
use  the  above  template,  you  would  complete  it  with  code  that  filled 
the  three  byte  arrays  for  the  RGB  components  (rMap,  gMap,  bMap) 
and  the  index  array  (pixels)  with  the  appropriate  values. 

Transparency 

Transparency  is  one  of  the  reasons  indexed  images  are  often  used 
for  Web  graphics.  In  an  indexed  image,  it  is  possible  to  define  one 
of  the  index  values  so  that  it  is  displayed  in  a  transparent  manner 
and  at  selected  image  locations  the  background  beneath  the  image 
shows  through.  In  Java  this  can  be  controlled  when  creating  the 
image’s  color  model  (IndexColorModel).  As  an  example,  to  make 
color  index  2  in  Prog.  12.3  transparent,  line  39  would  need  to  be 
modified  as  follows: 

int  tidx  =  2;  //  index  of  transparent  color 
IndexColorModel  icm2  = 

new  IndexColorModel (pixBits , nColors ,pRed,pGrn,pBlu, tidx) ; 
ip . setColorModel (icm2) ; 

At  this  time,  however,  Image J  does  not  support  the  transparency 
property;  it  is  not  considered  during  display,  and  it  is  lost  when  the 
image  is  saved. 


12.2  Color  Spaces  and  Color  Conversion 

The  RGB  color  system  is  well-suited  for  use  in  programming,  as  it  is 
simple  to  manipulate  and  maps  directly  to  the  typical  display  hard¬ 
ware.  When  modifying  colors  within  the  RGB  space,  it  is  important 
to  remember  that  the  metric ,  or  measured  distance  within  this  color 
space,  does  not  proportionally  correspond  to  our  perception  of  color 
(e.g.,  doubling  the  value  of  the  red  component  does  not  necessarily 
result  in  a  color  which  appears  to  be  twice  as  red).  In  general,  in 
this  space,  modifying  different  color  points  by  the  same  amount  can 
cause  very  different  changes  in  color.  In  addition,  brightness  changes 
in  the  RGB  color  space  are  also  perceived  as  nonlinear. 

Since  changing  any  component  modifies  color  tone,  saturation, 
and  brightness  all  at  once,  color  selection  in  RGB  space  is  difficult  and 
quite  non-intuitive.  Color  selection  is  more  intuitive  in  other  color 
spaces,  such  as  the  HSV  space  (see  Sec.  12.2.3),  since  perceptual  color 
features,  such  as  saturation,  are  represented  individually  and  can  be 
modified  independently.  Alternatives  to  the  RGB  color  space  are  also 
used  in  applications  such  as  the  automatic  separation  of  objects  from 
a  colored  background  (the  blue  box  technique  in  television),  encoding 
television  signals  for  transmission,  or  in  printing,  and  are  thus  also 
relevant  in  digital  image  processing. 


12.2  Color  Spaces  and 
Color  Conversion 
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12  Color  Images 


Fig.  12.9 

Examples  of  the  color  dis¬ 
tribution  of  natural  images. 
Original  images:  landscape 
photograph  with  dominant 
green  and  blue  components 
and  sun-spot  image  with  rich 
red  and  yellow  components 
(a).  Distribution  of  image 
colors  in  RGB-space  (b). 


B 


W 


R 


Figure  12.9  shows  the  distribution  of  the  colors  from  natural  im¬ 
ages  in  the  RGB  color  space.  The  first  half  of  this  section  introduces 
alternative  color  spaces  and  the  methods  of  converting  between  them, 
and  later  discusses  the  choices  that  need  to  be  made  to  correctly 
convert  a  color  image  to  grayscale.  In  addition  to  the  classical  color 
systems  most  widely  used  in  programming,  precise  reference  systems, 
such  as  the  CIEXYZ  color  space,  gain  increasing  importance  in  prac¬ 
tical  color  processing. 

12.2.1  Conversion  to  Grayscale 

The  conversion  of  an  RGB  color  image  to  a  grayscale  image  proceeds 
by  computing  the  equivalent  gray  or  luminance  value  Y  for  each  RGB 
pixel.  In  its  simplest  form,  Y  could  be  computed  as  the  average 

y  =  Avg (R,  G,  B)  =  R  +  C  +  B  (12.7) 

of  the  three  color  components  R,  G,  and  B.  Since  we  perceive  both 
red  and  green  as  being  substantially  brighter  than  blue,  the  resulting 
image  will  appear  to  be  too  dark  in  the  red  and  green  areas  and 
too  bright  in  the  blue  ones.  Therefore,  a  weighted  sum  of  the  color 
components  is  typically  used  for  calculating  the  equivalent  brightness 
or  luminance  in  the  form 

Y  =  Lum(R,  G,  B)  =  wr-R  +  Wq-G  +  wB  -B  (12.8) 

The  weights  most  often  used  were  originally  developed  for  encoding 
analog  color  television  signals  (see  Sec.  12.2.4)  are 
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ujr  —  0.299, 


wq  =  0.587, 


wB  =  0.114, 


(12.9) 


and  the  weights  recommended  in  ITU-BT.709  [122]  for  digital  color 
encoding  are 

wR  =  0.2126,  Wq  =  0.7152,  wB  =  0.0722.  (12.10) 

If  each  color  component  is  assigned  the  same  weight,  as  in  Eqn.  (12.7), 
this  is  of  course  just  a  special  case  of  Eqn.  (12.8). 

Note  that,  although  these  weights  were  developed  for  use  with  TV 
signals,  they  are  optimized  for  linear  RGB  component  values,  that 
is,  signals  with  no  gamma  correction.  In  many  practical  situations, 
however,  the  RGB  components  are  actually  nonlinear ,  particularly 
when  we  work  with  sRGB  images  (see  Ch.  14,  Sec.  14.4).  In  this 
case,  the  RGB  components  must  first  be  linearized  to  obtain  the 
correct  luminance  values  with  the  aforementioned  weights. 

In  some  color  systems,  instead  of  a  weighted  sum  of  the  RGB 
color  components,  a  nonlinear  brightness  function,  for  example  the 
value  V  in  HSV  (Eqn.  (12.14)  in  Sec.  12.2.3)  or  the  luminance  L  in 
HLS  (Eqn.  (12.25)),  is  used  as  the  intensity  value  Y. 


Hueless  (gray)  color  images 

An  RGB  image  is  hueless  or  gray  when  the  RGB  components  of  each 
pixel  I(u,  v)  =  (R,  G,  B)  are  the  same;  that  is,  if 

R  =  G  =  R. 


Therefore,  to  completely  remove  the  color  from  an  RGB  image,  sim¬ 
ply  replace  the  R,  G,  and  B  component  of  each  pixel  with  the  equiv¬ 
alent  gray  value  V, 

( Rgve.y\  fY\ 

Ggray  =  y  ,  (12.11) 

Way/  W 


by  using  Y  =  Lum(R,  G,  B)  from  Eqns.  (12.8)  and  (12.9),  for  exam¬ 
ple.  The  resulting  grayscale  image  should  have  the  same  subjective 
brightness  as  the  original  color  image. 


Grayscale  conversion  in  ImageJ 

In  ImageJ,  the  simplest  way  to  convert  an  RGB  color  image  (of 
type  ColorProcessor)  into  an  8-bit  grayscale  image  is  to  use  the 
ImageProcessor-method 

convertToByteProcessor () , 

which  returns  a  new  image  of  type  ByteProcessor.  ImageJ  uses  the 
default  weights  wR  =  wG  =  wB  =  ^  (as  in  Eqn.  (12.7))  for  the  RGB 
components,  or  alternatively  wR  =  0.299,  wG  =  0.587,  wB  =  0.114 
(as  in  Eqn.  (12.9))  if  the  “Weighted  RGB  Conversions”  option  is 
selected  in  the  Edit  >  Options  >  Conversions  dialog.  Arbitrary  com¬ 
ponent  weights  can  be  specified  for  subsequent  conversion  operations 
through  the  static  ColorProcessor  method 

setRGBWeights (double  wR,  double  wG,  double  wB) . 

Similarly,  the  static  method  getWeightingFactors  ()  of  class  Color- 
Processor  can  be  used  to  retrieve  the  current  component  weights  as 
a  3-element  double-array.  Note  that  no  linearization  is  performed 
on  the  color  components,  which  should  be  considered  when  working 
with  (nonlinear)  sRGB  colors  (see  Ch.  14,  Sec.  14.4  for  details). 


12.2  Color  Spaces  and 
Color  Conversion 
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12  Color  Images 


Fig.  12.10 

Desaturation  in  RGB  space: 

original  color  point  C  = 
(.R,  G ,  B ),  its  corresponding 
gray  point  G  =  (Y,Y,Y), 
and  the  desaturated  color 
point  D.  Saturation  is  con¬ 
trolled  by  the  factor  s. 


306 


12.2.2  Desaturating  RGB  Color  Images 

Desaturation  is  the  uniform  reduction  of  the  amount  of  color  in  an 
RGB  image  in  a  continuous  manner.  It  is  done  by  replacing  each 
RGB  pixel  by  a  desaturated  color  obtained  by  linear  interpolation 
between  the  pixel’s  original  color  and  the  corresponding  (Y,  Y,  Y) 
gray  point  in  the  RGB  space,  that  is, 


R 

G 

B 


desat 

desat 

desat 


) 


+  S  • 


(12.12) 


again  with  Y  =  Lum(R,  G,  B)  from  Eqns.  (12.8)  and  (12.9),  where 
the  factor  s  E  [0, 1]  controls  the  remaining  amount  of  color  satura¬ 
tion  (Fig.  12.10).  A  value  of  s  =  0  completely  eliminates  all  color, 
resulting  in  a  true  grayscale  image,  and  with  s  =  1  the  color  values 
will  be  unchanged.  In  Prog.  12.5,  continuous  desaturation  as  defined 
in  Eqn.  (12.12)  is  implemented  as  an  Image J  plugin. 

In  color  spaces  where  color  saturation  is  represented  by  an  explicit 
component  (such  as  HSV  and  HLS,  for  example),  desaturation  is  of 
course  much  easier  to  accomplish  (by  simply  reducing  the  saturation 
value  to  zero). 


b  w 


12.2.3  HSV/HSB  and  HLS  Color  Spaces 

In  the  HSV  color  space,  colors  are  specified  by  the  components  hue , 
saturation ,  and  value.  Often,  such  as  in  Adobe  products  and  the 
Java  API,  the  HSV  space  is  called  HSB.  While  the  acronym  is 
different  (in  this  case  B  =  brightness ),6  it  denotes  the  same  color 
space.  The  HSV  color  space  is  traditionally  shown  as  an  upside-down, 
six-sided  pyramid  (Fig.  12.11(a)),  where  the  vertical  axis  represents 
the  V  (brightness)  value,  the  horizontal  distance  from  the  axis  the  S 
(saturation)  value,  and  the  angle  the  H  (hue)  value.  The  black  point 
is  at  the  tip  of  the  pyramid  and  the  white  point  lies  in  the  center  of  the 
base.  The  three  primary  colors  red,  green ,  and  blue  and  the  pairwise 
mixed  colors  yellow ,  cyan ,  and  magenta  are  the  corner  points  of  the 
/? 

Sometimes  the  HSV  space  is  also  referred  to  as  the  “HSI”  space,  where 
“I”  stands  for  intensity. 


12.2  Color  Spaces  and 
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1  //  File  Desaturate_Rgb .  java 

2 

3  import  i j . ImagePlus ; 

4  import  ij .plugin. filter . PluglnFilter ; 

5  import  ij .process . ImageProcessor ; 

6 

7  public  class  Desaturate_Rgb  implements  PluglnFilter  { 

8  double  s  =  0.3;  // color  saturation  value 

9 

10  public  int  setup (String  arg,  ImagePlus  imp)  { 

11  return  D0ES_RGB; 

12  } 

13 

14  public  void  run (ImageProcessor  ip)  { 

15  //iterate  over  all  pixels: 

16  for  (int  v  =  0;  v  <  ip . get Height () ;  v++)  { 

17  for  (int  u  =  0;  u  <  ip.getWidthO  ;  u++)  { 

18 

19  //  get  int-packed  color  pixel: 

20  int  c  =  ip. get (u,  v) ; 

21 

22  //extract  RGB  components  from  color  pixel 

23  int  r  =  (c  &  OxffOOOO)  »  16; 

24  int  g  =  (c  &  OxOOffOO)  »  8; 

25  int  b  =  (c  &  OxOOOOff ) ; 

26 

27  //  compute  equiv.  gray  value: 

28  double  y  =  0.299  *  r  +  0.587  *  g  +  0.114  *  b; 

29 

30  //  linear  interpolate  (yyy)  ca  (rgb)\ 

31  r  =  (int)  (y  +  s  *  (r  -  y) )  ; 

32  g  =  (int)  (y  +  s  *  (g  -  y)); 

33  b  =  (int)  (y  +  s  *  (b  -  y) )  ; 

34 

35  //  reassemble  the  color  pixel: 

36  c  =  ((r  &  0xff)«16)  I  ( (g  &  0xff)«8)  I  b  &  Oxf f ; 

37  ip . set (u,  v,  c) ; 

38  } 

39  } 

40  } 

41 

42  } 


Prog.  12.5 

Continuous  desaturation  of 
an  RGB  color  image  (ImageJ 
plugin).  The  amount  of  color 
saturation  is  controlled  by  the 
variable  s  defined  in  line  8  (see 
Eqn.  (12.12)). 


base.  While  this  space  is  often  represented  as  a  pyramid,  according 
to  its  mathematical  definition,  the  space  is  actually  a  cylinder ,  as 
shown  in  Fig.  12.12. 

The  HLS  color  space7  (hue,  luminance ,  saturation )  is  very  sim¬ 
ilar  to  the  HSV  space,  and  the  hue  component  is  in  fact  completely 
identical  in  both  spaces.  The  luminance  and  saturation  values  also 
correspond  to  the  vertical  axis  and  the  radius,  respectively,  but  are 
defined  differently  than  in  HSV  space.  The  common  representation 
of  the  HLS  space  is  as  a  double  pyramid  (Fig.  12.11(b)),  with  black 

1-7 

The  acronyms  HLS  and  HSL  are  used  interchangeably. 
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12  Color  Images 


Fig.  12.11 

HSV  and  HLS  color  space  are 
traditionally  visualized  as  a 
single  or  double  hexagonal 
pyramid.  The  brightness  V 
(or  L)  is  represented  by  the 
vertical  dimension,  the  color 
saturation  S  by  the  radius 
from  the  pyramid’s  axis,  and 
the  hue  h  by  the  angle.  In 
both  cases,  the  primary  col¬ 
ors  red  (R),  green  (G),  and 
blue  (B)  and  the  mixed  col¬ 
ors  yellow  (Y),  cyan  (C),  and 
magenta  (M)  lie  on  a  com¬ 
mon  plane  with  black  (S)  at 
the  tip.  The  essential  differ¬ 
ence  between  the  HSV  and 
HLS  color  spaces  is  the  loca¬ 
tion  of  the  white  point  (W). 


on  the  bottom  tip  and  white  on  the  top.  The  primary  colors  lie  on  the 
corner  points  of  the  hexagonal  base  between  the  two  pyramids.  Even 
though  it  is  often  portrayed  in  this  intuitive  way,  mathematically  the 
HLS  space  is  again  a  cylinder  (see  Fig.  12.15). 


RGB— >*HSV  conversion 


To  convert  from  RGB  to  the  HSV  color  space,  we  first  find  the  satu- 


maxj  i 


ration  of  the  RGB  color  components  R,G,B  E  [0,0 
being  the  maximum  component  value  (typically  255),  as 


with  C 


max 


Cr 


-S'hsv  =  4  Chigh 


—  for  Chigh  >  0, 


0  otherwise 


(12.13) 


and  the  brightness  {value) 


^hsv  — 


Cm 


high 


c 


(12.14) 


max 


with 


CiOVf  =  min (R,  G,  B) ,  Chigh  =  max(i?,  G,  B) , 
C1  =  C  —  c 

wrng  ^high  wlow* 


(12.15) 


Finally,  we  need  to  specify  the  hue  value  iLHSV.  When  all  three 
RGB  color  components  have  the  same  value  {R  =  G  =  L>),  then 
we  are  dealing  with  an  achromatic  (gray)  pixel.  In  this  particular 
case  Crng  =  0  and  thus  the  saturation  value  S'hsv  =  0>  consequently 
the  hue  is  undefined.  To  calculate  L^hsv  when  Crng  >  0,  we  first 
normalize  each  component  using 


R 


G 


•) 


B 


(12.16) 


Then,  depending  on  which  of  the  three  original  color  components  had 
the  maximal  value,  we  compute  a  preliminary  hue  H'  as 


IB'  -G'  for  R  =  Chigh, 
R'  -  B'  +  2  for  C  =  Chigh, 
G'  —  R'  +  4  for  B  =  Chigh . 


(12.17) 
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Since  the  resulting  value  for  H'  lies  on  the  interval  [—1,5],  we  obtain 
the  final  hue  value  by  normalizing  to  the  interval  [0, 1]  as 


^hsv  —  g 


( H '  +  6)  for  H'  <  0, 
H'  otherwise. 


(12.18) 


Hence  all  three  components  iJHSV,  $hsv?  and  Vhsv  wdl  de  wdhin  the 
interval  [0, 1].  The  hue  value  iJHSV  can  naturally  also  be  computed 
in  another  angle  interval,  for  example,  in  the  0  to  360°  interval  using 


^hsv  —  ^hsv  ‘  360. 


(12.19) 


Under  this  definition,  the  RGB  space  unit  cube  is  mapped  to  a 
cylinder  with  height  and  radius  of  length  1  (Fig.  12.12).  In  con¬ 
trast  to  the  traditional  representation  (Fig.  12.11),  all  HSB  points 
within  the  entire  cylinder  correspond  to  valid  color  coordinates  in 
RGB  space.  The  mapping  from  RGB  to  the  HSV  space  is  nonlinear, 
as  can  be  noted  by  examining  how  the  black  point  stretches  com¬ 
pletely  across  the  cylinder’s  base.  Figure  12.12  plots  the  location  of 
some  notable  color  points  and  compares  them  with  their  locations  in 
RGB  space  (see  also  Fig.  12.1).  Figure  12.13  shows  the  individual 
HSV  components  (in  grayscale)  of  the  test  image  in  Fig.  12.2. 


RGB/HSV  values 
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Color  Conversion 


Fig.  12.12 

HSV  color  space.  The  illus¬ 
tration  shows  the  HSV  color 
space  as  a  cylinder  with  the 
coordinates  H  {hue)  as  the 
angle,  S  {saturation)  as  the 
radius,  and  V  {brightness 
value)  as  the  distance  along 
the  vertical  axis,  which  runs 
between  the  black  point  S 
and  the  white  point  W.  The 
table  lists  the  (R,  G,  B)  and 
(iL,  S,  V)  values  of  the  color 
points  marked  on  the  graphic. 
Pure  colors  (composed  of  only 
one  or  two  components)  lie  on 
the  outer  wall  of  the  cylinder 
{S  =  1),  as  exemplified  by  the 
gradually  saturated  reds  (R25, 
■H505  H755  H)- 


Fig.  12.13 

HSV  components  for  the  test 
image  in  Fig.  12.2.  The  darker 
areas  in  the  hHSV  component 
correspond  to  the  red  and 
yellow  colors,  where  the  hue 
angle  is  near  zero. 


Java  implementation 

In  Java,  the  RGB— ^HSV  conversion  is  implemented  in  the  standard 
AWT  Color  class  by  the  static  method 
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float  []  RGBtoHSB  (int  r,  int  g,  int  b,  f loat []  hsv ) 

(HSV  and  HSB  denote  the  same  color  space).  The  method  takes 
three  int  arguments  r,  g,  b  (within  the  range  [0,255])  and  returns 
a  float  array  with  the  resulting  H ,  5,  V  values  in  the  interval  [0, 1]. 
When  an  existing  float  array  is  passed  as  the  argument  hsv ,  then 
the  result  is  placed  in  it;  otherwise  (when  hsv  =  null)  a  new  array 
is  created.  Here  is  a  simple  example: 

import  j  ava . awt . Color ; 

•  •  • 

float  []  hsv  =  new  float  [3]  ; 

int  red  =  128,  green  =  255,  blue  =  0; 

hsv  =  Color . RGBtoHSB  (red,  green,  blue,  hsv); 

float  h  =  hsv  [0] ; 

float  s  =  hsv  [1] ; 

float  v  =  hsv  [2] ; 

•  •  • 

A  possible  implementation  of  the  Java  method  RGBtoHSB  ()  using  the 
definition  in  Eqns.  (12.14)-(12.18)  is  given  in  Prog.  12.6. 

HSV— RGB  conversion 

To  convert  an  HSV  tuple  (J^hsv?  ^hsv>  ^hsv)>  where  gy,  gy, 
and  Vhsv  C  [0, 1],  into  the  corresponding  (R,G,B)  color  values,  the 
appropriate  color  sector 


H'  =  (6  •  Hhsv)  m°d  6  (12.20) 

(with  0  <  H'  <  6)  is  determined  first,  followed  by  computing  the 
intermediate  values 

ci  =  [H'\  j  x  =  (1  —  Srsv)  '  ^hsv> 

c2  =  Hf  —  Ci,  y  =  (1  —  (S'hsv  •  c2))  •  Vhsv?  (12.21) 

z  =  (1  -  (Ahsv  •  (1  -  c2)))  •  Vhsv- 


Depending  on  the  value  of  c1,  the  normalized  RGB  values  R'  ,G' ,  B'  E 


(R\  G' ,  B')  <r-  { 


— 

Vhsv  > 

x,  y 

,  and 

z 

V 

X ) 

for 

Cl  = 

0, 

(y, 

v, 

x) 

for 

Cl  = 

1, 

(x, 

v, 

Z) 

for 

Cl  = 

2, 

(x, 

y , 

v) 

for 

Cl  = 

3, 

(z, 

x, 

v) 

for 

Cl  = 

4, 

(v, 

x, 

y ) 

for 

Cl  = 

5. 

(12.22) 


Scaling  the  RGB  components  back  to  integer  values  in  the  range 
[0,  255]  is  carried  out  as  follows: 


R  min(round(iC-R/),  255), 

G  min(round(iC-Gf/),  255),  (12.23) 

B  min(round(iC-H/),  255) . 


8  The  variables  x,  y,  z  used  here  are  not  related  to  the  CIEXYZ  color 
space  (see  Ch.  14,  Sec.  14.1). 
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float  []  RGBtoHSV  (int[]  RGB)  { 

int  R  =  RGB  [0]  ,  G  =  RGB  [1]  ,  B  =  RGB  [2]  ;  // R,  G,  B  E  [0,  255] 
int  cHi  =  Math. max (R, Math. max (G, B) )  ;  //  max.  comp,  value 

int  cLo  =  Math. min (R, Math. min (G, B) ) ;  //  min.  comp,  value 

int  cRng  =  cHi  -  cLo ;  //  component  range 

float  H  =  0,  S  =  0,  V  =  0; 
float  cMax  =  255. Of; 

//  compute  value  V 
V  =  cHi  /  cMax; 

//  compute  saturation  S 
if  (cHi  >  0) 

S  =  (float)  cRng  /  cHi; 

//  compute  hue  H 

if  (cRng  >  0)  {  //  hue  is  defined  only  for  color  pixels 

float  rr  =  (float) (cHi  -  R)  /  cRng; 

float  gg  =  (float) (cHi  -  G)  /  cRng; 

float  bb  =  (float) (cHi  -  B)  /  cRng; 

float  hh; 

if  (R  ==  cHi)  // R  is  largest  component  value 

hh  =  bb  -  gg; 

else  if  (G  ==  cHi)  //  G  is  largest  component  value 

hh  =  rr  -  bb  +  2 . Of ; 

else  //  B  is  largest  component  value 

hh  =  gg  -  rr  +  4 . Of ; 
if  (hh  <  0) 
hh  =  hh  +  6; 

H  =  hh  /  6; 

} 

return  new  f  loat  []  {H,  S,  V}; 


Java  implementation 

HSV— )>RGB  conversion  is  implemented  in  Java’s  standard  AWT 
Color  class  by  the  static  method 

int  HSBtoRGB  (float  h,  float  s,  float  tO , 

which  takes  three  float  arguments  h,  s  ,v  E  [0, 1]  and  returns  the 
corresponding  RGB  color  as  an  int  value  with  3x8  bits  arranged  in 
the  standard  Java  RGB  format  (see  Fig.  12.6).  One  possible  imple¬ 
mentation  of  this  method  is  shown  in  Prog.  12.7. 

RGB— >*HLS  conversion 

In  the  HLS  model,  the  hue  value  ilHLS  is  computed  in  the  same  way 
as  in  the  HSV  model  (Eqns.  (12.16)— (12.18)),  that  is, 

^hls  =  ^hsv-  (12.24) 

The  other  values,  Lhls  and  $hls?  are  calculated  as  follows  (for  Chigh, 
Clow,  and  Crng,  see  Eqn.  (12.15)): 


12.2  Color  Spaces  and 
Color  Conversion 

Prog.  12.6 

RGB— mSV  conversion  (Java 
implementation).  This  Java 
method  for  RGB— )>HSV  con¬ 
version  follows  the  process 
given  in  the  text  to  compute  a 
single  color  tuple.  It  takes  the 
same  arguments  and  returns 
results  identical  to  the  stan¬ 
dard  Color  .RGBtoHSBO  method. 
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12  Color  Images 

Prog.  12.7 

HSV— RGB  conversion 
(Java  implementation). 


1 

int  HSVtoEGB  (f  loat  []  HSV)  { 

2 

float 

H  = 

HSV[0]  ,  S  =  HSV  [1]  ,  V  =  HSV  [2]  ;  // H,  S,  V  £  [0,  1] 

3 

float 

r  = 

=  0,  g  =  0,  b  =  0; 

4 

float 

hh 

=  (6  *  H)  °/0  6;  //  h'  <—  (6  •  h)  mod  6 

5 

int 

cl 

=  (int)  hh;  //  cx  V-  \h! J 

6 

float 

c2 

=  hh  -  cl; 

7 

float 

x  = 

(1  -  S)  *  V; 

8 

float 

y  = 

(1  -  (S  *  c2) )  *  V; 

9 

float 

z  = 

(1  -  (S  *  (1  -  c2) ) )  *  V; 

10 

switch 

(cl)  { 

11 

case 

0 

r=V;  g=z;  b=x;  break; 

12 

case 

1 

r  =  y;  g=V;  b=x;  break; 

13 

case 

2 

r  =  x ;  g=V;  b=z;  break; 

14 

case 

3 

r  =  x ;  g=y;  b=V;  break; 

15 

case 

4 

r=z;  g=x;  b=V;  break; 

16 

17 

case 

} 

5 

r=V;  g=x;  b=y;  break; 

18 

int  R 

=  Math .min ( (int) (r  *  255),  255); 

19 

int  G 

=  Math .min ( (int) (g  *  255),  255); 

20 

int  B 

=  Math .min ( (int) (b  *  255),  255); 

21 

22 

return 

} 

new  int  []  {R,  G,  B}; 

^HLS  — 


^HLS  — 


(Cjiigh  +  C*1ow)/255 


2 

5 

0 

for  Lhls  =  0, 

0  5  Lng/^^ 
^HLS 

for  0  <C  Up[L g  C  0.5 

o.5  •  yy255 

i_LHLS 

for  0.5  <  Lhls  <  1 

0 

for  Lhls  =  1. 

(12.25) 


(12.26) 


Using  the  aforementioned  definitions,  the  RGB  color  cube  is  again 
mapped  to  a  cylinder  with  height  and  radius  1  (see  Fig.  12.15).  In 
contrast  to  the  HSV  space  (Fig.  12.12),  the  primary  colors  he  together 
in  the  horizontal  plane  at  Lhls  =0.5  and  the  white  point  lies  outside 
of  this  plane  at  Lhls  =  1.0.  Using  these  nonlinear  transformations, 
the  black  and  the  white  points  are  mapped  to  the  top  and  the  bottom 
planes  of  the  cylinder,  respectively.  All  points  inside  HLS  cylinder 
correspond  to  valid  colors  in  RGB  space.  Figure  12.14  shows  the 
individual  HLS  components  of  the  test  image  as  grayscale  images. 


Fig.  12.14 

HLS  color  components  LfHLS 
(, hue ),  (Shls  (saturation) , 
and  Lhls  (luminance). 
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RGB/HLS  values 


Pt. 

Color 

R 

G 

B 

H 

S 

L 

s 

Black 

0.00 

0.00 

0.00 

— 

0.00 

0.00 

R 

Red 

1.00 

0.00 

0.00 

0 

1.00 

0.50 

Y 

Yellow 

1.00 

1.00 

0.00 

1/6 

1.00 

0.50 

G 

Green 

0.00 

1.00 

0.00 

2/6 

1.00 

0.50 

C 

Cyan 

0.00 

1.00 

1.00 

3/6 

1.00 

0.50 

B 

Blue 

0.00 

0.00 

1.00 

4/6 

1.00 

0.50 

M 

Magenta 

1.00 

0.00 

1.00 

5/6 

1.00 

0.50 

W 

White 

1.00 

1.00 

1.00 

— 

0.00 

1.00 

^75 

75%  Red 

0.75 

0.00 

0.00 

0 

1.00 

0.375 

R-50 

50%  Red 

0.50 

0.00 

0.00 

0 

1.00 

0.250 

R-25 

25%  Red 

0.25 

0.00 

0.00 

0 

1.00 

0.125 

P 

Pink 

1.00 

0.50 

0.50 

0/6 

1.00 

0.75 

HLS— >*RGB  conversion 

When  converting  from  HLS  to  the  RGB  space,  we  assume  that  i^HLS, 
^hls?  ^hls  £  [0, 1].  In  the  case  where  Lhls  =  0  or  Lhls  =  1,  the 
result  is 


(0,0,0) 

(1,1,1) 


for  Lhls  =  0, 
for  Lhls  =  1. 


(12.27) 


12.2  Color  Spaces  and 
Color  Conversion 


Fig.  12.15 

HLS  color  space.  The  illustra¬ 
tion  shows  the  HLS  color  space 
visualized  as  a  cylinder  with 
the  coordinates  H  {hue)  as  the 
angle,  S  ( saturation )  as  the 
radius,  and  L  ( lightness )  as 
the  distance  along  the  vertical 
axis,  which  runs  between  the 
black  point  S  and  the  white 
point  W.  The  table  lists  the 
(R,  G,  B )  and  (H,  S,  L )  values 
where  “pure”  colors  (created 
using  only  one  or  two  color 
components)  lie  on  the  lower 
half  of  the  outer  cylinder  wall 
(S  =  1),  as  illustrated  by  the 
gradually  saturated  reds  (R25, 
R50,  R75,  R).  Mixtures  of  all 
three  primary  colors,  where  at 
least  one  of  the  components  is 
completely  saturated,  lie  along 
the  upper  half  of  the  outer 
cylinder  wall;  for  example,  the 
point  P  (pink). 


Otherwise,  we  again  determine  the  appropriate  color  sector 


H'  =  (6  •  .Hhls)  mod  6’ 


such  that  0  <  H'  <  6,  and  from  this 


Cf 

d 


=  [H'\,  c2=H'-c1 , 

^HLS  '  ^HLS  f°r  ^HLS  <  0-5, 

$HLS  '  (1  —  ^HLs)  f°r  ^HLS  >  0-5, 


(12.28) 


(12.29) 

(12.30) 


and  the  quantities 


w  —  ^HLS  + 

y  =  w  —  (w  —  x)  •  c2, 


x  —  ^HLS  —  d-> 

z  =  x  +  (w  —  x)  •  c2 


(12.31) 

(12.32) 


The  final  mapping  to  the  RGB  values  is  (similar  to  Eqn.  (12.22)) 


"  (w 

x) 

for 

Cl  = 

0, 

(y, 

re, 

x) 

for 

Cl  = 

1, 

<  b, 

re, 

Z> 

for 

Cl  = 

2, 

(x, 

2/, 

w) 

for 

Cl  = 

3, 

b, 

X, 

w) 

for 

Cl  = 

4, 

Aw 

x: 

y ) 

for 

Cl  = 

5. 

1  R', 

a 

(M 

0,1]) 

color 

(12.33) 


back  to  the  [0,  255]  range  is  done  as  in  Eqn.  (12.23) 
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Prog.  12.8 

RGB— ^HLS  conversion 
(Java  implementation). 


1 

float  []  RGBtoHLS  (int[]  RGB)  { 

2 

int  R  =  RGB  [0]  ,  G  =  RGB  [1]  ,  B 

=  RGB  [2]  ;  //  R,G,B  in  [0,255] 

3 

4 

5 

6 

7 

float  cHi  =  Math. max (R,  Math. max (G,  B)); 
float  cLo  =  Math.min(R,  Math.min(G,  B)); 
float  cRng  =  cHi  -  cLo;  // component  range 

//  compute  lightness  L 

float  L  =  ((cHi  +  cLo)  /  255f) 

/  2; 

8 

9 

10 

//  compute  saturation  S 

11 

float  S  =  0; 

12 

if  (0  <  L  &&  L  <  1)  { 

13 

float  d  =  (L  <=  0.5f)  ?  L  : 

(1  -  L); 

14 

S  =  0 . 5f  *  (cRng  /  255f )  /  d; 

15 

} 

16 

17 

//  compute  hue  H  (same  as  in  HSV) 

18 

float  H  =  0; 

19 

if  (cHi  >  0  &&  cRng  >  0)  { 

//this  is  a  color  pixel! 

20 

float  r  =  (float) (cHi  -  R)  / 

cRng; 

21 

float  g  =  (float) (cHi  -  G)  / 

cRng; 

22 

float  b  =  (float) (cHi  -  B)  / 

cRng; 

23 

float  h; 

24 

if  (R  ==  cHi) 

//  R  is  largest  component 

25 

h  =  b  -  g; 

26 

else  if  (G  ==  cHi) 

//  G  is  largest  component 

27 

h=r-b+2.0f; 

28 

else 

//  B  is  largest  component 

29 

h=g-r+4.0f; 

30 

if  (h  <  0) 

31 

h  =  h  +  6; 

32 

H  =  h  /  6; 

33 

} 

34 

return  new  f  loat  []  {H,  L,  S}; 

35 

} 

Java  implementation 

Currently  there  is  no  method  in  either  the  standard  Java  API  or 
Image J  for  converting  color  values  between  RGB  and  HLS.  Program 
12.8  gives  one  possible  implementation  of  the  RGB— ^HLS  conversion 
that  follows  the  definitions  in  Eqns.  (12.24)-(12.26).  The  HLS— ^RGB 
conversion  is  shown  in  Prog.  12.9. 

HSV  and  HLS  compared 

Despite  the  obvious  similarity  between  the  two  color  spaces,  as  Fig. 
12.16  illustrates,  substantial  differences  in  the  V/L  and  S  compo¬ 
nents  do  exist.  The  essential  difference  between  the  HSV  and  HLS 
spaces  is  the  ordering  of  the  colors  that  he  between  the  white  point  W 
and  the  “pure”  colors  (R,  G,  B,  Y,  C,  M),  which  consist  of  at  most 
two  primary  colors,  at  least  one  of  which  is  completely  saturated. 

The  difference  in  how  colors  are  distributed  in  RGB,  HSV,  and 
HLS  space  is  readily  apparent  in  Fig.  12.17.  The  starting  point  was  a 
distribution  of  1331  (llxllxll)  color  tuples  obtained  by  uniformly 
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12.2  Color  Spaces  and 
Color  Conversion 

Prog.  12.9 

HLS— RGB  conversion  (Java 
implementation) . 


Fig.  12.16 

HSV  and  HLS  components 
compared.  Saturation  (top 
row)  and  intensity  (bottom 
row).  In  the  color  saturation 
difference  image  Shsv  —  ^hls 
(top),  light  areas  correspond  to 
positive  values  and  dark  areas 
to  negative  values.  Saturation 
in  the  HLS  representation, 
especially  in  the  brightest  sec¬ 
tions  of  the  image,  is  notably 
higher,  resulting  in  negative 
values  in  the  difference  im¬ 
age.  For  the  intensity  ( value 
and  luminance ,  respectively) 
in  general,  Vhsv  —  ^hls 
and  therefore  the  difference 
Crsv  —  ^hls  (bottom)  is  al¬ 
ways  positive.  The  hue  compo¬ 
nent  H  (not  shown)  is  identical 
in  both  representations. 


Crsv  ^hls  Crsv  —  ^hls 


HSV  HLS  Difference 


1 

float  []  HLStoEGB  (f loat  []  HLS) 

{ 

2 

float  H  =  HLS  [0]  ,  L  =  HLS  [1]  , 

S 

=  HLS  [2]  ;  //  H ,  L ,  S  in  [0,  1] 

3 

float  r=0,  g=0,  b=0; 

4 

if  (L  <=  0)  //  black 

5 

r  =  g  =  b  =  0; 

6 

else  if  (L 

>=  1)  //white 

7 

r  =  g  =  b  =  1; 

8 

else  { 

9 

float  hh 

=  (6  *  H)  7.  6; 

//=  H' 

10 

int  cl 

=  ( int )  hh ; 

11 

float  c2 

=  hh  -  cl; 

12 

float  d  = 

(L  <=  0 . 5f )  ?  (S 

* 

L)  :  (S  *  (1  -  L) ) ; 

13 

float  w  = 

=  L  +  d; 

14 

float  x  = 

=  L  -  d; 

15 

float  y  = 

=  w  -  (w  -  x)  *  c2; 

16 

float  z  = 

=  x  +  (w  -  x)  *  c2; 

17 

switch  (cl)  { 

18 

case  0 

r  =  w;  g  =  z;  b  = 

X 

;  break; 

19 

case  1 

r  =  y;  g  =  w;  b  = 

X 

;  break; 

20 

case  2 

r  =  x;  g  =  w;  b  = 

z 

;  break; 

21 

case  3 

r  =  x;  g  =  y;  b  = 

w 

;  break; 

22 

case  4 

r  =  z;  g  =  x;  b  = 

w 

;  break; 

23 

case  5 

r  =  w;  g  =  x;  b  = 

y 

;  break; 

24 

} 

25 

}  //  r,  g,  b  in 

[0, 1] 

26 

int  R  =  Math. min (Math. round (r 

* 

255) ,  255) ; 

27 

int  G  =  Math. min (Math. round (g 

* 

255) ,  255) ; 

28 

int  B  =  Math. min (Math. round (b 

* 

255) ,  255) ; 

29 

return  new 

int  []  {R,  G,  B}; 

30 

} 
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Fig.  12.17 

Distribution  of  colors  in  the 
RGB,  HSV,  and  HLS  spaces. 
The  starting  point  is  the  uni¬ 
form  distribution  of  colors  in 
RGB  space  (top).  The  corre¬ 
sponding  colors  in  the  cylin¬ 
drical  spaces  are  distributed 
nonsymmetrically  in  HSV 
and  symmetrically  in  HLS. 


HSV  HLS 


sampling  the  RGB  space  at  an  interval  of  0.1  in  each  dimension.  We 
can  see  clearly  that  in  HSV  space  the  maximally  saturated  colors 
(s  =  1)  form  circular  rings  with  increasing  density  toward  the  upper 
plane  of  the  cylinder.  In  HLS  space,  however,  the  color  samples  are 
spread  out  symmetrically  around  the  center  plane  and  the  density 
is  significantly  lower,  particularly  in  the  region  near  white.  A  given 
coordinate  shift  in  this  part  of  the  color  space  leads  to  relatively  small 
color  changes,  which  allows  the  specification  of  very  fine  color  grades 
in  HLS  space,  especially  for  colors  located  in  the  upper  half  of  the 
HLS  cylinder. 

Both  the  HSV  and  HLS  color  spaces  are  widely  used  in  practice; 
for  instance,  for  selecting  colors  in  image  editing  and  graphics  design 
applications.  In  digital  image  processing,  they  are  also  used  for  color 
keying  (i.e.,  isolating  objects  according  to  their  hue)  on  a  homoge¬ 
neously  colored  background  where  the  brightness  is  not  necessarily 
constant. 


Desaturation  in  HSV /HLS  color  space 

Desaturation  of  color  images  (cf.  Sec.  12.2.2)  represented  in  HSV 
or  HLS  color  space  is  trivial  since  color  saturation  is  available  as  a 
separate  component.  In  particular,  pixels  with  zero  saturation  are 
uncolored  or  gray.  For  example,  HSV  colors  can  be  gradually  or 
fully  desaturated  by  simply  multiplying  the  component  S'  by  a  fixed 
saturation  factor  s  E  [0, 1]  and  keeping  iL,  V  unchanged,  that  is, 
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12.2  Color  Spaces  and 
(12.34)  <yOLOR  Conversion 

which  works  analogously  with  HLS  colors.  While  Eqn.  (12.34)  applies 
equally  to  all  colors,  it  might  be  interesting  to  selectively  modify  only 
colors  with  certain  hues.  This  is  easily  accomplished  by  replacing  the 
fixed  saturation  factor  s  by  a  hue-dependent  function  f(H)  (see  also 
Exercise  12.6). 


H. 

St 

Vr 


desat 

desat 


desat 


12.2.4  TV  Component  Color  Spaces — YUV,  YIQ,  and 
YCbCr 

These  color  spaces  are  an  integral  part  of  the  standards  surrounding 
the  recording,  storage,  transmission,  and  display  of  television  sig¬ 
nals.  YUV  and  YIQ  are  the  fundamental  color-encoding  methods 
for  the  analog  NTSC  and  PAL  systems,  and  YCbCr  is  a  part  of  the 
international  standards  governing  digital  television  [114].  All  of  these 
color  spaces  have  in  common  the  idea  of  separating  the  luminance 
component  Y  from  two  chroma  components  and,  instead  of  directly 
encoding  colors,  encoding  color  differences.  In  this  way,  compatibil¬ 
ity  with  legacy  black  and  white  systems  is  maintained  while  at  the 
same  time  the  bandwidth  of  the  signal  can  be  optimized  by  using 
different  transmission  bandwidths  for  the  brightness  and  the  color 
components.  Since  the  human  visual  system  is  not  able  to  perceive 
detail  in  the  color  components  as  well  as  it  does  in  the  intensity  part 
of  a  video  signal,  the  amount  of  information,  and  consequently  band¬ 
width,  used  in  the  color  channel  can  be  reduced  to  approximately 
1/4  of  that  used  for  the  intensity  component.  This  fact  is  also  used 
when  compressing  digital  still  images  and  is  why,  for  example,  the 
JPEG  codec  converts  RGB  images  to  YCbCr.  That  is  why  these 
color  spaces  are  important  in  digital  image  processing,  even  though 
raw  YIQ  or  YUV  images  are  rarely  encountered  in  practice. 


YUV 

YUV  is  the  basis  for  the  color  encoding  used  in  analog  television  in 
both  the  North  American  NTSC  and  the  European  PAL  systems. 
The  luminance  component  Y  is  computed,  just  as  in  Eqn.  (12.9), 
from  the  RGB  components  as 


Y  =  0.299-R  +  0.587-G  +  0.114-5  (12.35) 

under  the  assumption  that  the  RGB  values  have  already  been  gamma 
corrected  according  to  the  TV  encoding  standard  (7ntsc  =2.2  and 
Ypal  =  2.8,  see  Ch.  4,  Sec.  4.7)  for  playback.  The  UV  components 
are  computed  from  a  weighted  difference  between  the  luminance  and 
the  blue  or  red  components  as 


U  =  0.492  ■  (B  —  Y)  und  V  =  0.877  •  (R  -  Y), 


and  the  entire  transformation  from  RGB  to  YUV  is 


0.299  0.587 

-0.147  -0.289 
0.615  -0.515 


0.114 

0.436 

-0.100 


R 

G 

B 


(12.36) 


(12.37) 
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Fig.  12.18 

Examples  of  the  color  distri¬ 
bution  of  natural  images  in 
different  color  spaces.  Orig¬ 
inal  images  (a);  color  dis¬ 
tribution  in  HSV-  (b),  and 
YUV-space  (c).  See  Fig.  12.9 
for  the  corresponding  distri¬ 
butions  in  RGB  color  space. 


(b) 

HSV 


(c) 

YUV 
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The  transformation  from  YUV  back  to  RGB  is  found  by  inverting 
the  matrix  in  Eqn.  (12.37): 

(R\  (  1.000  0.000  1.140  \  (Y\ 

\G  =  1.000  -0.395  -0.581  •  \U  .  (12.38) 

\b)  \  1.000  2.032  0.000/  \V  J 

The  color  distributions  in  YUV-space  for  a  set  of  natural  images  are 
shown  in  Fig.  12.18. 

YIQ 

The  original  NTSC  system  used  a  variant  of  YUV  called  YIQ  (I 
for  “in-phase”,  Q  for  “quadrature”),  where  both  the  U  and  V  color 
vectors  were  rotated  and  mirrored  such  that 

(£)-(;;)•(-*?  <i2-39) 


where  f3  =  0.576  (33°).  The  Y  component  is  the  same  as  in  YUV.  \2.2  Color  Spaces  and 
Although  the  YIQ  has  certain  advantages  with  respect  to  bandwidth  Color  Conversion 
requirements  it  has  been  completely  replaced  by  YUV  [124,  p.  240]. 


YCbCr 

The  YCbCr  color  space  is  an  internationally  standardized  variant 
of  YUV  that  is  used  for  both  digital  television  and  image  compres¬ 
sion  (e.g.,  in  JPEG).  The  chroma  components  Gb,  CT  are  (similar 
to  U,V)  difference  values  between  the  luminance  and  the  blue  and 
red  components,  respectively.  In  contrast  to  YUV,  the  weights  of 
the  RGB  components  for  the  luminance  Y  depend  explicitly  on  the 
coefficients  used  for  the  chroma  values  Ch  and  CY  [197,  p.  16].  For 
arbitrary  weights  wBlwRl  the  transformation  is  defined  as 


Y  =  wR  •  R  +  (1  —  wB  —  wR)  •  G  +  wB  •  F>,  (12.40) 

cb  =  2E-(B-y),  (i2.4i) 

1  -  W'B 

C  =  T^--C?-n  (12.42) 

with  wR  =  0.299  and  wB  =  0.114  (wG  =  0.587)9  according  to  ITU10 
recommendation  BT.601  [123].  Analogously,  the  reverse  mapping 
from  YCbCr  to  RGB  is 


R  =  Y  + 


(1  -  wR)  •  Cr 


G  =  Y  - 


B  =  Y  + 


wB  ■  (1  —  wB )  •  Cb  +  wR  ■  (1  —  wR)  ■  Cr 
0.5  •  (1  -wB  -  wR) 

(1  ~  wB)  •  Ch 

0.5 


(12.43) 

(12.44) 

(12.45) 


In  matrix- vector  notation  this  gives  the  linear  transformation 


/Y\  /  0.299  0.587  0.114  \  (R\ 

(  Gb  )  =  (  -0.169  -0.331  0.500  •  G  , 

\Cr)  \  0.500  -0.419  -0.081 )  \B ) 


(12.46) 


(R\  (  1.000  0.000  1.403  \  (Y\ 

G  =  1.000  -0.344  -0.714  •  Gb  . 

\b)  \  1.000  1.773  0.000/  \CT) 


(12.47) 


Different  weights  are  recommended  based  on  how  the  color  space  is 
used;  for  example,  ITU-BT.709  [122]  recommends  wR  =  0.2125  and 
wB  =  0.0721  to  be  used  in  digital  HDTV  production.  The  values  of 
£7,  V,  /,  Q,  and  Cbl  Gr  may  be  both  positive  or  negative.  To  encode 
ChlCr  values  to  digital  numbers,  a  suitable  offset  is  typically  added 
to  obtain  positive-only  values,  for  example,  128  =  27  in  case  of  8-bit 
components. 

Figure  12.19  shows  the  three  color  spaces  YUV,  YIQ,  and  YCbCr 
together  for  comparison.  The  U,  V,  I,Q,  and  Cb,Cr  values  in  the 

WR  +  Wq  +  WB  =  1. 

10  International  Telecommunication  Union  (www.itu.int). 
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yuv 


Fig.  12.19 

Comparing  YUV-,  YIQ-, 
and  YCbCr  values.  The 
Y  values  are  identical 
in  all  three  color  spaces. 


right  two  frames  have  been  offset  by  128  so  that  the  negative  values 
are  visible.  Thus  a  value  of  zero  is  represented  as  medium  gray  in 
these  images.  The  YCbCr  encoding  is  practically  indistinguishable 
from  YUV  in  these  images  since  they  both  use  very  similar  weights 
for  the  color  components. 

12.2.5  Color  Spaces  for  Printing — CMY  and  CMYK 

In  contrast  to  the  additive  RGB  color  scheme  (and  its  various  color 
models),  color  printing  makes  use  of  a  subtractive  color  scheme,  where 
each  printed  color  reduces  the  intensity  of  the  reflected  light  at  that 
location.  Color  printing  requires  a  minimum  of  three  primary  colors; 
traditionally  cyan  (C),  magenta  (M),  and  yellow  ( Y )n  have  been 
used. 

Using  subtractive  color  mixing  on  a  white  background,  C  =  M  = 
Y  =  0  (no  ink)  results  in  the  color  white  and  C  =  M  =  Y  =  1 
(complete  saturation  of  all  three  inks)  in  the  color  black.  A  cyan- 
colored  ink  will  absorb  red  (R)  most  strongly,  magenta  absorbs  green 

11  Note  that  in  this  case  Y  stands  for  yellow  and  is  unrelated  to  the  Y 
luma  or  luminance  component  in  YUV  or  YCbCr. 
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(G),  and  yellow  absorbs  blue  ( B ).  The  simplest  form  of  the  CMY  \2.2  Color  Spaces  and 
model  is  defined  as  Color  Conversion 


C  =  l-R,  M  =  1  —  G,  Y=l-B.  (12.48) 

In  practice,  the  color  produced  by  fully  saturating  all  three  inks  is  not 
physically  a  true  black.  Therefore,  the  three  primary  colors  G,  M,  Y 
are  usually  supplemented  with  a  black  ink  (K)  to  increase  the  color 
range  and  coverage  (gamut).  In  the  simplest  case,  the  amount  of 
black  is 


K  =  min (G,  M,  Y) . 


(12.49) 


With  rising  levels  of  black,  however,  the  intensity  of  the  G,  M,  Y 
components  can  be  gradually  reduced.  Many  methods  for  reducing 
the  primary  dyes  have  been  proposed  and  we  look  at  three  of  them 
in  the  following. 


CMY-^CMYK  conversion  (version  1) 

In  this  simple  variant  the  G,  M,  Y  values  are  reduced  linearly  with 
increasing  K  (Eqn.  (12.49)),  which  yields  the  modified  components 
as 

(CA 

Mi 

L 

\KJ 

CMY— >CMYK  conversion  (version  2) 

The  second  variant  corrects  the  color  by  reducing  the  G,  M,  Y  com¬ 
ponents  by  s  =  resulting  in  stronger  colors  in  the  dark  areas  of 

the  image: 

(C2\ 

m2 
y2 

U  J 


({C-K)-s\ 
(M-K)-s 
( Y-K ) -s 

\  K  ) 


1 


with  s  = 


1  -K 

1 


for  K  <  1, 
otherwise. 


(12.51) 


(C  -K\ 
M-K 
Y  -K 

(  TY  I 


(12.50) 


In  both  versions,  the  K  component  (as  defined  in  Eqn.  (12.49))  is 
used  directly  without  modification,  and  all  gray  tones  (that  is,  when 
R  =  G  =  B)  are  printed  using  black  ink  iY,  without  any  contribution 
from  G,  M,  or  Y. 

While  both  of  these  simple  definitions  are  widely  used,  neither 
one  produces  high  quality  results.  Figure  12.20(a)  compares  the  re¬ 
sult  from  version  2  with  that  produced  with  Adobe  Photoshop  (Fig. 
12.20(c)).  The  difference  in  the  cyan  component  G  is  particularly  no¬ 
ticeable  and  also  the  exceeding  amount  of  black  (K)  in  the  brighter 
areas  of  the  image. 

In  practice,  the  required  amounts  of  black  K  and  G,  M,  Y  depend 
so  strongly  on  the  printing  process  and  the  type  of  paper  used  that 
print  jobs  are  routinely  calibrated  individually. 
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Fig.  12.20 

RGB— )>CMYK  conversion  com¬ 
parison.  Simple  conversion 
using  Eqn.  (12.51)  (a),  apply¬ 
ing  the  undercolor-removal 
and  black- generation  func¬ 
tions  of  Eqn.  (12.52)  (b),  and 
results  obtained  with  Adobe 
Photoshop  (c).  The  color  in¬ 
tensities  are  shown  inverted, 
that  is,  darker  areas  represent 
higher  CMYK  color  values. 
The  simple  conversion  (a),  in 
comparison  with  Photoshop’s 
result  (c),  shows  strong  devia¬ 
tions  in  all  color  components, 
C  and  K  in  particular.  The 
results  in  (b)  are  close  to  Pho¬ 
toshop’s  and  could  be  further 
improved  by  tuning  the  corre¬ 
sponding  function  parameters. 


Version  2  (Eqn.  (12.51))  Version  3  (Eqn.  (12.52))  Adobe  Photoshop 
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CMY-^CMYK  conversion  (version  3) 

In  print  production,  special  transfer  functions  are  applied  to  tune 
the  results.  For  example,  the  Adobe  PostScript  interpreter  [135,  p. 
345]  specifies  an  undercolor-removal  function  /ucr(^0  f°r  gradually 
reducing  the  CMY  components  and  a  separate  black- generation  func¬ 
tion  /bg(^0  f°r  controlling  the  amount  of  black.  These  functions  are 
used  in  the  form 


(C,\ 

fc-n  cr(k)\ 

m3 

M  -  fVCR(K) 

y3 

y  -  WY 

W 

(  /bg(Y  / 

(12.52) 


where  K  =  min(C,  M,  Y),  as  defined  in  Eqn.  (12.49).  The  func¬ 
tions  fucR  and  /BG  are  usually  nonlinear,  and  the  resulting  values 


f(K) 

1.0 

0.8 

0.6 

0.4 

0.2 

0.2  0.4  0.6  0.8  1.0 


12.3  Statistics  of 
Color  Images 


Fig.  12.21 

Examples  of  undercolor- 
removal  function  /UCr  (Eqn. 
(12.53))  and  black  generation 
function  /BG  (Eqn.  (12.54)). 
The  parameter  settings  are 
sK  —  0.1,  K0  —  0.3,  and 
K  =09 

max  w  ^  * 


C3,  M3,  Y3,  K3  are  scaled  (typically  by  means  of  clamping )  to  the  in¬ 
terval  [0,1].  The  example  shown  in  Fig.  12.20(b)  was  produced  to 
approximate  the  results  of  Adobe  Photoshop  using  the  definitions 


/ucr  CO  =  sk-K, 


(12.53) 


/bgGO 


1° 

}K 

(nmax  i_  Kc 


for  K  <  K0 , 
for  K  >  K0, 


(12.54) 


where  sK  =  0.1,  K0  =  0.3,  and  Knmx  =  0.9  (see  Fig.  12.21).  With 
this  definition,  /ucr  reduces  the  CMY  components  by  10%  of  the 
K  value  (by  Eqn.  (12.52)),  which  mostly  affects  the  dark  areas  of 
the  image  with  high  K  values.  The  effect  of  the  function  /BG  (Eqn. 
(12.54))  is  that  for  values  of  K  -Kq  (i.e.,  in  the  light  areas  of  the 
image)  no  black  ink  is  added  at  all.  In  the  interval  K  =  K0, . . . ,  1.0, 
the  black  component  is  increased  linearly  up  to  the  maximum  value 
Knmx.  The  result  in  Fig.  12.20(b)  is  relatively  close  to  the  CMYK 
component  values  produced  by  Photoshop12  in  Fig.  12.20(c).  It  could 
be  further  improved  by  adjusting  the  function  parameters  sK,  K0 , 
and  Kmax  (Eqn.  (12.52)). 

Even  though  the  results  of  this  last  variant  (3)  for  converting 
RGB  to  CMYK  are  better,  it  is  only  a  gross  approximation  and  still 
too  imprecise  for  professional  work.  As  we  discuss  in  Chapter  14, 
technically  correct  color  conversions  need  to  be  based  on  precise, 
“colorimetric”  grounds. 


12.3  Statistics  of  Color  Images 

12.3.1  How  Many  Different  Colors  are  in  an  Image? 

A  minor  but  frequent  task  in  the  context  of  color  images  is  to  de¬ 
termine  how  many  different  colors  are  contained  in  a  given  image. 

12  Actually  Adobe  Photoshop  does  not  convert  directly  from  RGB  to 
CMYK.  Instead,  it  first  converts  to,  and  then  from,  the  CIELAB  color 
space  (see  Ch.  14,  Sec.  14.1). 
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Prog.  12.10 

Counting  the  colors  con¬ 
tained  in  an  RGB  image. 
The  method  countColors () 
first  creates  a  copy  of  the 
ID  RGB  (int)  pixel  array 
(line  3),  then  sorts  that  ar¬ 
ray,  and  finally  counts  the 
transitions  between  contigu¬ 
ous  blocks  of  identical  colors. 


One  way  of  doing  this  would  be  to  create  and  fill  a  histogram  array 
with  one  integer  element  for  each  color  and  subsequently  count  all 
histogram  cells  with  values  greater  than  zero.  But  since  a  24-bit  RGB 
color  image  potentially  contains  224  =  16,  777,  216  colors,  the  result¬ 
ing  histogram  array  (with  a  size  of  64  megabytes)  would  be  larger 
than  the  image  itself  in  most  cases! 

A  simple  solution  to  this  problem  is  to  sort  the  pixel  values  in 
the  (ID)  pixel  array  such  that  all  identical  colors  are  placed  next 
to  each  other.  The  sorting  order  is  of  course  completely  irrelevant, 
and  the  number  of  contiguous  color  blocks  in  the  sorted  pixel  vector 
corresponds  to  the  number  of  different  colors  in  the  image.  This 
number  can  be  obtained  by  simply  counting  the  transitions  between 
neighboring  color  blocks,  as  shown  in  Prog.  12.10.  Of  course,  we  do 
not  want  to  sort  the  original  pixel  array  (which  would  destroy  the 
image)  but  a  copy  of  it,  which  can  be  obtained  with  Java’s  clone  () 
method.13  Sorting  of  the  ID  array  in  Prog.  12.10  is  accomplished 
(in  line  4)  with  the  generic  Java  method  Arrays .  sort  () ,  which  is 
implemented  very  efficiently. 


1  int  countColors  (ColorProcessor  cp)  { 

2  //  duplicate  the  pixel  array  and  sort  it 

3  int  []  pixels  =  ((int[])  cp .  getPixels  ()).  clone  () ; 

4  Arrays  .  sort  (pixels)  ;  //requires  java,  util  .Arrays 

5 

6  int  k  =  1 ;  //  color  count  (image  contains  at  least  1  color) 

7  for  (int  i  =  0;  i  <  pixels . length-1 ;  i++)  { 

8  if  (pixels  [i]  !=  pixels  [i  +  1]) 

9  k  =  k  +  1; 

10  } 

11  return  k; 

12  } 


12.3.2  Color  Histograms 

We  briefly  touched  on  histograms  of  color  images  in  Chapter  3,  Sec. 
3.5,  where  we  only  considered  the  ID  distributions  of  the  image  in¬ 
tensity  and  the  individual  color  channels.  For  instance,  the  built-in 
ImageJ  method  getHistogramO ,  when  applied  to  an  object  of  type 
ColorProcessor,  simply  computes  the  intensity  histogram  of  the 
corresponding  gray  values: 

ColorProcessor  cp; 

int  []  H  =  cp .  get  Histogram  ()  ; 

As  an  alternative,  one  could  compute  the  individual  intensity  his¬ 
tograms  of  the  three  color  channels,  although  (as  discussed  in  Chap¬ 
ter  3,  Sec.  3.5.2)  these  do  not  provide  any  information  about  the  ac¬ 
tual  colors  in  this  image.  Similarly,  of  course,  one  could  compute  the 
distributions  of  the  individual  components  of  any  other  color  space, 
such  as  HSV  or  CIELAB. 
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Java  arrays  implement  the  Cloneable  interface. 


A  full  histogram  of  an  RGB  image  is  3D  and,  as  noted  earlier,  12.4  Exercises 
consists  of  256  x  256  x  256  =  224  cells  of  type  int  (for  8-bit  color 
components).  Such  a  histogram  is  not  only  very  large14  but  also 
difficult  to  visualize. 


2D  color  histograms 

A  useful  alternative  to  the  full  3D  RGB  histogram  are  2D  histogram 
projections  (Fig.  12.22).  Depending  on  the  axis  of  projection,  we  ob¬ 
tain  2D  histograms  with  coordinates  red-green  (hRG),  red-blue  (hRB), 
or  green-blue  (hGB),  respectively,  with  the  values 

hRG(r,  :=  number  of  pixels  with  I(u,v )  =  (r,  g,*), 

hRB(r,  b)  :=  number  of  pixels  with  7(r,  v)  =  (r,  *,  6),  (12.55) 

hGB Qb  6)  :=  number  of  pixels  with  I(u,v)  =  (*,g,6), 

where  *  denotes  an  arbitrary  component  value.  The  result  is,  in¬ 
dependent  of  the  original  image  size,  a  set  of  2D  histograms  of  size 
256  x  256  (for  8-bit  RGB  components),  which  can  easily  be  visualized 
as  images.  Note  that  it  is  not  necessary  to  obtain  the  full  RGB  his¬ 
togram  in  order  to  compute  the  combined  2D  histograms  (see  Prog. 
12.11). 


w 


B 


Fig.  12.22 

2D  RGB  histogram  projec- 
VV  tions.  3D  RGB  cube  illustrat¬ 
ing  an  image’s  color  distri¬ 
bution  (a).  The  color  points 
indicate  the  corresponding 
pixel  colors  and  not  the  color 
frequency.  The  combined  his¬ 
tograms  for  red-green  (hRG), 
red-blue  (hRB),  and  green-blue 
(has)  are  2D  projections  of 
the  3D  histogram.  The  cor¬ 
responding  image  is  shown  in 
Fig.  12.9(a). 


As  the  examples  in  Fig.  12.23  show,  the  combined  color  his¬ 
tograms  do,  to  a  certain  extent,  express  the  color  characteristics  of  an 
image.  They  are  therefore  useful,  for  example,  to  identify  the  coarse 
type  of  the  depicted  scene  or  to  estimate  the  similarity  between  im¬ 
ages  (see  also  Exercise  12.8). 


12.4  Exercises 

Exercise  12.1.  Create  an  Image J  plugin  that  rotates  the  individual 
components  of  an  RGB  color  image;  that  is,  R  G  —>  B  R. 

Exercise  12.2.  Pseudocolors  are  sometimes  used  for  displaying  gray¬ 
scale  images  (i.e.,  for  viewing  medical  images  with  high  dynamic 

14  It  may  seem  a  paradox  that,  although  the  RGB  histogram  is  usually 
much  larger  than  the  image  itself,  the  histogram  is  not  sufficient  in 
general  to  reconstruct  the  original  image. 
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Fig.  12.23 

Combined  color  histogram 
examples.  For  better  view¬ 
ing,  the  images  are  inverted 
(dark  regions  indicate  high  fre¬ 
quencies)  and  the  gray  value 
corresponds  to  the  logarithm 
of  the  histogram  entries  (scaled 
to  the  maximum  entries). 


(a)  Original  images 


(b)  Red-green  histogram  ( R  G  |) 


(c)  Red-blue  histogram  ( R  ->,B  |) 


(d)  Green-blue  histogram  {G  -i,  B  |) 


range).  Create  an  Image J  plugin  for  converting  8-bit  grayscale  im¬ 
ages  to  an  indexed  image  with  256  colors,  simulating  the  hues  of 
glowing  iron  (from  dark  red  to  yellow  and  white). 

Exercise  12.3.  Create  an  Image  J  plugin  that  shows  the  color  table 
of  an  8-bit  indexed  image  as  a  new  image  with  16  x  16  rectangular 
color  fields.  Mark  all  unused  color  table  entries  in  a  suitable  way. 
Look  at  Prog.  12.3  as  a  starting  point. 

Exercise  12.4.  Show  that  a  “desaturated”  RGB  pixel  produced  in 
the  form  (r,  g,b)  -A  (y,y,y),  where  y  is  the  equivalent  luminance 
value  (see  Eqn.  (12.11)),  has  the  luminance  y  as  well. 

oZD 


12.4  Exercises 


1  int  []  []  get2dHistogram 

2  (ColorProcessor  cp,  int  cl,  int  c2)  { 

3  //  cl ,  c2:  component  index  R  =  0,  G  =  1 ,  B  =  2 

4 

5  int  []  RGB  =  new  int  [3]  ; 

6  int  []  []  h  =  new  int  [256]  [256]  ;  //  2D  histogram  h[cl]  [c2] 

7 

8  for  (int  v  =  0;  v  <  cp . get Height () ;  v++)  { 

9  for  (int  u  =  0;  u  <  cp .getWidthO  ;  u++)  { 

10  cp . getPixel (u,  v,  RGB); 

11  int  il  =  RGB  [cl]; 

12  int  i2  =  RGB  [c2]  ; 

13  //  increment  the  associated  histogram  cell 

14  h [il]  [i2]  ++ ; 

15  } 

16  } 

17  return  h; 

18  } 


Prog.  12.11 

Java  method  get2dHistogram() 
for  computing  a  combined  2D 
color  histogram.  The  color 
components  (histogram  axes) 
are  specified  by  the  parameters 
cl  and  c2.  The  color  distribu¬ 
tion  H  is  returned  as  a  2D  int 
array.  The  method  is  defined 
in  class  ColorStatistics  (Prog. 
12.10). 


Exercise  12.5.  Extend  the  Image J  plugin  for  desaturating  color  im¬ 
ages  in  Prog.  12.5  such  that  the  image  is  only  modified  inside  the 
user-selected  region  of  interest  (ROI). 

Exercise  12.6.  Write  an  Image  J  plugin  that  selectively  desaturates 
an  RGB  image,  preserving  colors  with  a  hue  close  to  a  given  reference 
color  cref  =  (Rref,  Gref,  E>ref),  with  (HSV)  hue  i7ref  (see  the  example 
in  Fig.  12.24).  Transform  the  image  to  HSV  and  modify  the  colors 
(cf.  Eqn.  (12.34))  in  the  form 

/#desat\  /  H  \ 

(  -Sdesat  J  =  (  J  ,  (12.56) 

VDesat  /  V  V  J 


Fig.  12.24 

Selective  desaturation  ex¬ 
ample.  Original  image  with 
selected  reference  color 
cref  =  (250,92,150)  (a),  de- 
saturated  image  (b).  Gaus¬ 
sian  saturation  function  f{H) 
(see  Eqn.  (12.58))  with  refer¬ 
ence  hue  Hr„ f  =  0.9388  and 
<7  =  0.1  (C) 
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12  Color  Images  where  f(H)  is  a  smooth  saturation  function,  for  example,  a  Gaussian 

function  of  the  form 

f(H)  =  =  ga(H-Hre(),  (12.57) 

with  center  i7ref  and  variance  a2  (see  Fig.  12.24(c)).  Recall  that 
the  H  component  is  circular  in  [0, 1).  To  obtain  a  continuous  and 
periodic  saturation  function  we  note  that  H'  =  H  —  Hiei  is  in  the 
range  [—1,1]  and  reformulate  f(H)  as 


f(H) 


I  ga(H')  for  —0.5  <  H'  <  0.5, 
ga(H'  + 1)  for  H'  <  -0.5, 
<ga(H'-l)  for  H'  >  0.5. 


(12.58) 


Verify  the  values  of  the  function  f(H ),  check  in  particular  that  it  is 
1  for  the  reference  color!  What  would  be  a  good  (synthetic)  color 
image  for  validating  the  saturation  function?  Use  Image  J’s  color 
picker  (pipette)  tool  to  specify  the  reference  color  cref  interactively.15 

Exercise  12.7.  Calculate  (analogous  to  Eqns.  (12.46)-(12.47))  the 
complete  transformation  matrices  for  converting  from  (linear)  RGB 
colors  to  YCbCr  for  the  ITU-BT.709  (HDTV)  standard  with  the 
coefficients  wR  =  0.2126,  wB  =  0.0722  and  wG  =  0.7152. 


Exercise  12.8.  Determining  the  similarity  between  images  of  differ¬ 
ent  sizes  is  a  frequent  problem  (e.g.,  in  the  context  of  image  data 
bases).  Color  statistics  are  commonly  used  for  this  purpose  because 
they  facilitate  a  coarse  classification  of  images,  such  as  landscape  im¬ 
ages,  portraits,  etc.  However,  2D  color  histograms  (as  described  in 
Sec.  12.3.2)  are  usually  too  large  and  thus  cumbersome  to  use  for 
this  purpose.  A  simple  idea  could  be  to  split  the  2D  histograms  or 
even  the  full  RGB  histogram  into  K  regions  (bins)  and  to  combine 
the  corresponding  entries  into  a  iGdimensional  feature  vector,  which 
could  be  used  for  a  coarse  comparison.  Develop  a  concept  for  such  a 
procedure,  and  also  discuss  the  possible  problems. 

Exercise  12.9.  Write  a  program  (plugin)  that  generates  a  sequence 
of  colors  with  constant  hue  and  saturation  but  different  brightness 
(value)  in  HSV  space.  Transform  these  colors  to  RGB  and  draw  them 
into  a  new  image.  Verify  (visually)  if  the  hue  really  remains  constant. 

Exercise  12.10.  When  applying  any  type  of  filter  in  HSV  or  HLS 
color  space  one  must  keep  in  mind  that  the  hue  component  H  is 
circular  in  [0, 1)  and  thus  shows  a  discontinuity  at  the  1—^0  (360  -A 
0°)  transition.  For  example,  a  linear  filter  would  not  take  into  account 
that  H  =  0.0  and  H  =  1.0  refer  to  the  same  hue  (red)  and  thus 
cannot  be  applied  directly  to  the  H  component.  One  solution  is  to 
filter  the  cosine  and  sine  values  of  the  H  component  (which  really 
is  an  angle)  instead,  and  composing  the  filtered  hue  array  from  the 
filtered  cos  /  sin  values  (see  Ch.  15,  Sec.  15.1.3  for  details).  Based  on 
this  idea,  implement  a  variable-sized  linear  Gaussian  filter  (see  Ch. 
5,  Sec.  5.2.7)  for  the  HSV  color  space. 
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15 


The  current  color  pick  is  returned  by  the  ImageJ  method  Toolbar. 
getForegroundColor () . 


13 


Color  Quantization 


The  task  of  color  quantization  is  to  select  and  assign  a  limited  set 
of  colors  for  representing  a  given  color  image  with  maximum  fidelity. 
Assume,  for  example,  that  a  graphic  artist  has  created  an  illustra¬ 
tion  with  beautiful  shades  of  color,  for  which  he  applied  150  dif¬ 
ferent  crayons.  His  editor  likes  the  result  but,  for  some  technical 
reason,  instructs  the  artist  to  draw  the  picture  again,  this  time  using 
only  10  different  crayons.  The  artist  now  faces  the  problem  of  color 
quantization — his  task  is  to  select  a  “palette”  of  the  10  best  suited 
from  his  150  crayons  and  then  choose  the  most  similar  color  to  redraw 
each  stroke  of  his  original  picture. 

In  the  general  case,  the  original  image  I  contains  a  set  of  m  dif¬ 
ferent  colors  C  =  {c1;  c2, . . . ,  Cm},  where  m  could  be  only  a  few  or 
several  thousand,  but  at  most  224  for  a  3  x  8-bit  color  image.  The 
goal  is  to  replace  the  original  colors  by  a  (usually  much  smaller)  set 
of  colors  C  =  {C^,  C[>, . . . ,  C^},  with  n  <  m.  The  difficulty  lies  in 
the  proper  choice  of  the  reduced  color  palette  C  such  that  damage 
to  the  resulting  image  is  minimized. 

In  practice,  this  problem  is  encountered,  for  example,  when  con¬ 
verting  from  full-color  images  to  images  with  lower  pixel  depth  or  to 
index  (“palette”)  images,  such  as  the  conversion  from  24-bit  TIFF 
to  8-bit  GIF  images  with  only  256  (or  fewer)  colors.  Until  a  few 
years  ago,  a  similar  problem  had  to  be  solved  for  displaying  full-color 
images  on  computer  screens  because  the  available  display  memory 
was  often  limited  to  only  8  bits.  Today,  even  the  cheapest  display 
hardware  has  at  least  24-bit  depth  and  therefore  this  particular  need 
for  (fast)  color  quantization  no  longer  exists. 


13.1  Scalar  Color  Quantization 

Scalar  (or  uniform)  quantization  is  a  simple  and  fast  process  that  is 
independent  of  the  image  content.  Each  of  the  original  color  compo¬ 
nents  (e.g.,  R G^,  Bj)  in  the  range  [0, . . . ,  m  —  1]  is  independently 
converted  to  the  new  range  [0, . . . ,  n—  1],  in  the  simplest  case  by  a 
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13  Color  Quantization 
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7 
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Fig.  13.1 

Scalar  quantization  of  color 
components  by  truncating 
lower  bits.  Quantization 
of  3  X  12-bit  to  3  X  8-bit 
colors  (a).  Quantization  of 
3  X  8-bit  to  3:3:2-packed 
8-bit  colors  (b).  The  Java 
code  segment  in  Prog.  13.1 
shows  the  corresponding  se¬ 
quence  of  bit  operations. 
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linear  quantization  in  the  form 


for  all  color  components  q.  A  typical  example  would  be  the  conver¬ 
sion  of  a  color  image  with  3  x  12-bit  components  (m  =  4096)  to  an 
RGB  image  with  3  x  8-bit  components  (n  =  256).  In  this  case,  each 
original  component  value  is  multiplied  by  n/m  =  256/4096  =  1/16  = 
2-4  and  subsequently  truncated,  which  is  equivalent  to  an  integer  di¬ 
vision  by  16  or  simply  ignoring  the  lower  4  bits  of  the  corresponding 
binary  values  (see  Fig.  13.1(a)).  m  and  n  are  usually  the  same  for  all 
color  components  but  not  always. 

An  extreme  (today  rarely  used)  approach  is  to  quantize  3x8 
color  vectors  to  single-byte  (8-bit)  colors,  where  3  bits  are  used  for 
red  and  green  and  only  2  bits  for  blue,  as  shown  in  Prog.  13.1(b).  In 
this  case,  m  =  256  for  all  color  components,  nred  =  ngreen  =  8,  and 

^blue 


Ca  •  — 


n 

m. 


(13.1) 


Prog.  13.1 

Quantization  of  a  3  X  8- 
bit  RGB  color  pixel  to 
8  bits  by  3:3:2  packing. 


1  ColorProcessor  cp  =  (ColorProcessor)  ip; 

2  int  C  =  cp . getPixel (u,  v) ; 

3  int  R  =  (C  &  OxOOffOOOO)  »  16; 

4  int  G  =  (C  &  OxOOOOffOO)  »  8; 

5  int  B  =  (C  &  OxOOOOOOf f ) ; 

6  //  3:3:2  uniform  color  quantization 

7  byte  RGB  = 

8  (byte)  ( (R  &  OxEO)  I  (G  &  0xE0)»3  I  ( (B  &  0xC0)»6)); 


Unlike  the  techniques  described  in  the  following,  scalar  quanti¬ 
zation  does  not  take  into  account  the  distribution  of  colors  in  the 
original  image.  Scalar  quantization  is  an  optimal  solution  only  if  the 
image  colors  are  uniformly  distributed  within  the  RGB  cube.  How¬ 
ever,  the  typical  color  distribution  in  natural  images  is  anything  but 
uniform,  with  some  regions  of  the  color  space  being  densely  populated 
and  many  colors  entirely  missing.  In  this  case,  scalar  quantization  is 
not  optimal  because  the  interesting  colors  may  not  be  sampled  with 
sufficient  density  while  at  the  same  time  colors  are  represented  that 
do  not  appear  in  the  image  at  all. 
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13.2  Vector 
Quantization 


Fig.  13.2 

Color  distribution  after  a 
scalar  3:3:2  quantization.  Orig¬ 
inal  color  image  (a).  Distri¬ 
bution  of  the  original  226,321 
colors  (b)  and  the  remaining 
8  x  8  x  4  =  256  colors  after 
3:3:2  quantization  (c)  in  the 
RGB  color  cube. 


13.2  Vector  Quantization 

Vector  quantization  does  not  treat  the  individual  color  components 
separately  as  does  scalar  quantization,  but  each  color  vector  Ci  = 
(?y,^,  bj)  or  pixel  in  the  image  is  treated  as  a  single  entity.  Starting 
from  a  set  of  original  color  tuples  C  =  {c1,c2, . . .  ,cm},  the  task  of 
vector  quantization  is 

a)  to  find  a  set  of  n  representative  color  vectors  C  =  {c^,  c2, . . . ,  c^} 
and 

b)  to  replace  each  original  color  by  one  of  the  new  color  vectors 
c'  G  C', 

where  n  is  usually  predetermined  (n  <  m)  and  the  resulting  deviation 
from  the  original  image  shall  be  minimal.  This  is  a  combinatorial 
optimization  problem  in  a  rather  large  search  space,  which  usually 
makes  it  impossible  to  determine  a  global  optimum  in  adequate  time. 
Thus  all  of  the  following  methods  only  compute  a  “local”  optimum 
at  best. 


13.2.1  Populosity  Algorithm 

The  populosity  algorithm1  [104]  selects  the  n  most  frequent  colors  in 
the  image  as  the  representative  set  of  color  vectors  C .  Being  very 
easy  to  implement,  this  procedure  is  quite  popular.  The  method 
described  in  Sec.  12.3.1,  based  on  sorting  the  image  pixels,  can  be 
used  to  determine  the  n  most  frequent  image  colors.  Each  original 

1  Sometimes  also  called  the  “popularity”  algorithm. 
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13  Color  Quantization  pixel  is  then  replaced  by  the  closest  representative  color  vector  in 

C ;  that  is,  the  quantized  color  vector  with  the  smallest  distance  in 
the  3D  color  space. 

The  algorithm  performs  sufficiently  only  as  long  as  the  original 
image  colors  are  not  widely  scattered  through  the  color  space.  Some 
improvement  is  possible  by  grouping  similar  colors  into  larger  cells 
first  (by  scalar  quantization).  However,  a  less  frequent  (but  possibly 
important)  color  may  get  lost  whenever  it  is  not  sufficiently  similar 
to  any  of  the  n  most  frequent  colors. 


13.2.2  Median-Cut  Algorithm 

The  median-cut  algorithm  [104]  is  considered  a  classical  method  for 
color  quantization  that  is  implemented  in  many  applications  (includ¬ 
ing  Image J).  As  in  the  populosity  method,  a  color  histogram  is  first 
computed  for  the  original  image,  traditionally  with  a  reduced  number 
of  histogram  cells  (such  as  32  x  32  x  32)  for  efficiency  reasons.2  The 
initial  histogram  volume  is  then  recursively  split  into  smaller  boxes 
until  the  desired  number  of  representative  colors  is  reached.  In  each 
recursive  step,  the  color  box  representing  the  largest  number  of  pixels 
is  selected  for  splitting.  A  box  is  always  split  across  the  longest  of  its 
three  axes  at  the  median  point,  such  that  half  of  the  contained  pixels 
remain  in  each  of  the  resulting  subboxes  (Fig.  13.3). 


Fig.  13.3 

Median-cut  algorithm.  The 
RGB  color  space  is  recur¬ 
sively  split  into  smaller  cubes 
along  one  of  the  color  axes. 


The  result  of  this  recursive  splitting  process  is  a  partitioning  of 
the  color  space  into  a  set  of  disjoint  boxes,  with  each  box  ideally 
containing  the  same  number  of  image  pixels.  In  the  last  step,  a 
representative  color  vector  (e.g.,  the  mean  vector  of  the  contained 
colors)  is  computed  for  each  color  cube,  and  all  the  image  pixels  it 
contains  are  replaced  by  that  color. 

The  advantage  of  this  method  is  that  color  regions  of  high  pixel 
density  are  split  into  many  smaller  cells,  thus  reducing  the  overall 
quantization  error.  In  color  regions  of  low  density,  however,  relatively 
large  cubes  and  thus  large  color  deviations  may  occur  for  individual 
pixels. 

The  median-cut  method  is  described  in  detail  in  Algorithms  13.1- 
13.3  and  a  corresponding  Java  implementation  can  be  found  in  the 
source  code  section  of  this  book’s  website  (see  Sec.  13.2.5). 

r\ 

This  corresponds  to  a  scalar  prequantization  on  the  color  components, 
which  leads  to  additional  quantization  errors  and  thus  produces  subop- 
timal  results.  This  step  seems  unnecessary  on  modern  computers  and 
should  be  avoided. 


1:  MedianCutfl.  ATmax) 

I:  color  image,  Kmax:  max.  number  of  quantized  colors 
Returns  a  new  quantized  image  with  at  most  iLmax  colors. 

2:  Cq  <—  FindRepresentativeColors(/,  iLmax) 

3:  return  Quantizelmage(/,  Cq)  D>  see  Alg.  13.3 


4:  FindRepresentativeColors(  J,  Amax) 

Returns  a  set  of  up  to  ATmax  representative  colors  for  the  image 

I. 


5: 


6 

7 

8 


9: 

10: 

11: 

12: 

13: 

14: 

15: 

16: 

17: 

18: 

19: 

20: 

21: 

22: 

23: 


Let  C  =  {cl5  c2,...,cK}be  the  set  of  distinct  colors  in  I.  Each  of 
the  K  color  elements  in  C  is  a  tuple  =  (re<T,  grrq,  blip,  cnlq) 
consisting  of  the  RGB  color  components  (red,  grn,  blu)  and 
the  number  of  pixels  (cnt)  in  I  with  that  particular  color, 
if  \C\  <  Xmax  then 
return  C 
else 

Create  a  color  box  b0  at  level  0  that  contains  all  image  colors 
C  and  make  it  the  initial  element  in  the  set  of  color  boxes  B: 
b0  <—  CreateColorBox(C,  0)  >  see  Alg.  13.2 

B  <—  {ho}  >  initial  set  of  color  boxes 

k  <-  1 

done  <—  false 

while  k  <  Vmax  and  -i done  do 

b  c-  FindBoxToSplit(B)  t>  see  Alg.  13.2 

if  b  7^  nil  then 

(hi,  h2)  <—  SplitBox(h)  >  see  Alg.  13.2 

B  i —  B-{b}  >  remove  h  from  B 

B  B  U  {b1,  h2}  >  insert  hl5  h2  into  B 

k  <—  k  +  1 

else  >  no  more  boxes  to  split 

done  <—  true 

Collect  the  average  colors  of  all  color  boxes  in  B: 

Cq  { AverageColor (h^)  |  bj  (E  B}  t>  see  Alg.  13.3 

return  Cq 


13.2.3  Octree  Algorithm 

Similar  to  the  median-cut  algorithm,  this  method  is  also  based  on 
partitioning  the  3D  color  space  into  cells  of  varying  size.  The  octree 
algorithm  [82]  utilizes  a  hierarchical  structure,  where  each  cube  in 
color  space  may  contain  eight  subcubes.  This  partitioning  is  repre¬ 
sented  by  a  tree  structure  (octree)  with  a  cube  at  each  node  that  may 
again  link  to  up  to  eight  further  nodes.  Thus  each  node  corresponds 
to  a  subrange  of  the  color  space  that  reduces  to  a  single  color  point  at 
a  certain  tree  depth  d  (e.g.,  d  =  8  for  a  3  x  8-bit  RGB  color  image). 

When  an  image  is  processed,  the  corresponding  quantization  tree, 
which  is  initially  empty,  is  created  dynamically  by  evaluating  all  pix¬ 
els  in  a  sequence.  Each  pixel’s  color  tuple  is  inserted  into  the  quanti¬ 
zation  tree,  while  at  the  same  time  the  number  of  nodes  is  limited  to 
a  predefined  value  K  (typically  256).  When  a  new  color  tuple  is 
inserted  and  the  tree  does  not  contain  this  color,  one  of  the  following 
situations  can  occur: 


13.2  Vector 
Quantization 

Alg.  13.1 

Median-cut  color  quantiza¬ 
tion  (part  1).  The  input  im¬ 
age  I  is  quantized  to  up  to 
itmax  representative  colors 
and  a  new,  quantized  im¬ 
age  is  returned.  The  main 
work  is  done  in  procedure 
FindRepresentativeColors(),  which 
iteratively  partitions  the  color 
space  into  increasingly  smaller 
boxes.  It  returns  a  set  of  rep¬ 
resentative  colors  (Cq)  that  are 
subsequently  used  by  proce¬ 
dure  Quantizelmage()  to  quan¬ 
tize  the  original  image  I .  Note 
that  (unlike  in  most  common 
implementations)  no  prequanti¬ 
zation  is  applied  to  the  original 
image  colors. 
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13  Color  Quantization 

Alg.  13.2 

Median-cut  color  quan¬ 
tization  (part  2). 


1: 


2 

3 
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5: 
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9: 

10: 

11: 

12: 

13: 

14: 


CreateColorBox(C,  m) 

Creates  and  returns  a  new  color  box  containing  the  colors  C  and 
level  m.  A  color  box  b  is  a  tuple  (colors,  level,  rmin,  rmax,  gmin, 
gmax,  bmin,  bmax),  where  colors  is  the  set  of  image  colors  repre¬ 
sented  by  the  box,  level  denotes  the  split-level,  and  rmin, ... ,  bmax 
describe  the  color  boundaries  of  the  box  in  RGB  space. 


Find  the  RGB  extrema  of  all  colors  in  C : 

^"min  i  9min  i  ^min  ^  “bOO 
Cnax5  ,9maxi  Gnax  ^  (^> 

for  all  c  £  C  do 

^min  ^  min  (rmin,  rsd(c)) 

^max  <-  max  (rmax,  red(c)) 
gmin  <-  min  (#min,  grn(c)) 

5max  ^  max  (^maX)  grn(c)) 
bmin  <-  min  (6min,  blu(c)) 

^max  <-  max  (6max,  blu(c)) 

^  ^  (C  ^  TYl)  rmin,  rmax,  (7min5  fi'max)  ^min?  ^max) 

return  b 


FindBoxToSplit  (B) 

Searches  the  set  of  boxes  B  for  a  box  to  split  and  returns  this 
box,  or  nil  if  no  splittable  box  can  be  found. 

Find  the  set  of  color  boxes  that  can  be  split  (i.e.,  contain  at  least 
2  different  colors): 

Bs  ^r-  {b\b  £  B  A  |colors(b)|  >  2} 

if  Bs  =  {}  then  >  no  splittable  box  was  found 

return  nil 
else 

Select  a  box  bx  from  Bs,  such  that  level (bx)  is  a  minimum: 

bx  u-  argmin(level(6)) 

beBs 

return  bx 


15: 


16 

17 

18 


19: 


20 

21 

22 

23 


SplitBox(h) 

Splits  the  color  box  b  at  the  median  plane  perpendicular  to  its 
longest  dimension  and  returns  a  pair  of  new  color  boxes. 
rri  level  (6) 

d  <—  FindMaxBoxDimension(h)  >  see  Alg.  13.3 

C  A-  colors (6)  t>  the  set  of  colors  in  box  b 

From  all  colors  in  C  determine  the  median  of  the  color  dis¬ 
tribution  along  dimension  d  and  split  C  into  Cl5  C2: 

'  {c  £  C  |  red(c)  <  median(red(c))}  for  d  =  Red 

cec 


Ci  A-  < 


{c  G  C  |  grn(c)  <  median(grn(c))} 

c<EC 


for  d  =  Green 


{c  G  C  |  blu(c)  <  median(blu(c))}  for  d  =  Blue 
k  cec 


C2  A~  C  \  Ci 

bi  A-  CreateColorBo x(Cl5m  +  1) 
b2  A-  CreateColorBo x(C2,m  +  1) 
return  (b1?  b2) 


1.  If  the  number  of  nodes  is  less  than  K  and  no  suitable  node  for 
the  color  ci  exists  already,  then  a  new  node  is  created  for  C^. 

2.  Otherwise  (i.e.,  if  the  number  of  nodes  is  if),  the  existing  nodes 
at  the  maximum  tree  depth  (which  represent  similar  colors)  are 
merged  into  a  common  node. 
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1:  AverageColor(b) 

Returns  the  average  color  cavg  for  the  pixels  represented  by  the 
color  box  b. 

2:  C  <—  colors (b)  >  the  set  of  colors  in  box  b 

3:  n  <—  0 

4:  Tr  0,  Eg  <—  0,  Tb  0 

5:  for  all  c  G  C  do 

6:  k  <—  cnt(c) 

7:  n  n  +  k 

8:  Tr  Tr  +  k  •  red(c) 

9:  Tg  «—  Eg  +  k  •  grn(c) 

10:  Eh  ^  Eh  +  k  ■  blu(c) 

11:  c  <—  (E r/n,  Eg/n ,  ^b/n) 

12:  return  c 


13:  FindMaxBoxDimension(h) 

Returns  the  largest  dimension  of  the  color  box  b  (Red,  Green,  or 
Blue). 

14:  dr  —  rmax(b)  —  rmin(b) 

15:  dg  =  gmax(6)  —  gmin(6) 

16:  dh  =  bmax(b)  —  bmin(b) 

17:  dmax  =  max(dr,  dg,  dh) 

18:  if  dmax  —  dT  then 

19:  return  Red. 

20:  else  if  dmax  =  dg  then 

21:  return  Green 

22:  else 

23:  return  Blue 


24: 


25: 

26: 


27: 

28: 


Quantizelmage(/,  Cq) 

Returns  a  new  image  with  color  pixels  from  I  replaced  by  their 
closest  representative  colors  in  Cq. 


I'  duplicate (/)  D>  create  a  new  image 

for  all  image  coordinates  (u,  v)  do 

Find  the  quantization  color  in  Cq  that  is  “closest”  to  the  cur¬ 
rent  pixel  color  (e.g.,  using  the  Euclidean  distance  in  RGB 
space) : 

I'(u,  v)  argmin  || I(u,  v )  —  c || 


c<EC 


return  I' 


q 


13.2  Vector 
Quantization 

Alg.  13.3 

Median-cut  color  quantization 
(part  3). 


A  key  advantage  of  the  iterative  octree  method  is  that  the  number 
of  color  nodes  remains  limited  to  K  in  any  step  and  thus  the  amount 
of  required  storage  is  small.  The  final  replacement  of  the  image 
pixels  by  the  quantized  color  vectors  can  also  be  performed  easily 
and  efficiently  with  the  octree  structure  because  only  up  to  eight 
comparisons  (one  at  each  tree  layer)  are  necessary  to  locate  the  best¬ 
matching  color  for  each  pixel. 

Figure  13.4  shows  the  resulting  color  distributions  in  RGB  space 
after  applying  the  median-cut  and  octree  algorithms.  In  both  cases, 
the  original  image  (Fig.  13.2(a))  is  quantized  to  256  colors.  Notice  in 
particular  the  dense  placement  of  quantized  colors  in  certain  regions 
of  the  green  hues.  For  both  algorithms  and  the  (scalar)  3:3:2  quan- 
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13  Color  Quantization 


Fig.  13.4 

Color  distribution  after  appli¬ 
cation  of  the  median-cut  (a) 
and  octree  (b)  algorithms.  In 
both  cases,  the  set  of  226,321 
colors  in  the  original  image 
(Fig.  13.2(a))  was  reduced 
to  256  representative  colors. 


Fig.  13.5 

Quantization  errors.  Original 
image  (a),  distance  between 
original  and  quantized  color 
pixels  for  scalar  3:3:2  quan¬ 
tization  (b),  median-cut  (c), 
and  octree  (d)  algorithms. 
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tization,  the  resulting  distances  between  the  original  pixels  and  the 
quantized  colors  are  shown  in  Fig.  13.5.  The  greatest  error  naturally 
results  from  3:3:2  quantization,  because  this  method  does  not  con¬ 
sider  the  contents  of  the  image  at  all.  Compared  with  the  median-cut 
method,  the  overall  error  for  the  octree  algorithm  is  smaller,  although 
the  latter  creates  several  large  deviations,  particularly  inside  the  col¬ 
ored  foreground  regions  and  the  forest  region  in  the  background.  In 
general,  however,  the  octree  algorithm  does  not  offer  significant  ad¬ 
vantages  in  terms  of  the  resulting  image  quality  over  the  simpler 
median-cut  algorithm. 


(c)  Median-cut 


(b)  3:3:2 


(d)  Octree 


13.2.4  Other  Methods  for  Vector  Quantization 

A  suitable  set  of  representative  color  vectors  can  usually  be  deter¬ 
mined  without  inspecting  all  pixels  in  the  original  image.  It  is  often 


sufficient  to  use  only  10%  of  randomly  selected  pixels  to  obtain  a  high  ^3  3  Exercises 
probability  that  none  of  the  important  colors  is  lost. 

In  addition  to  the  color  quantization  methods  described  already, 
several  other  procedures  and  refined  algorithms  have  been  proposed. 

This  includes  statistical  and  clustering  methods,  such  as  the  classical 
k-means  algorithm,  but  also  the  use  of  neural  networks  and  genetic 
algorithms.  A  good  overview  can  be  found  in  [219]. 


13.2.5  Java  Implementation 

The  Java  implementation3  of  the  algorithms  described  in  this  chapter 
consists  of  a  common  interface  Color  Quantizer  and  the  concrete 
classes 

•  MedianCutQuantizer, 

•  OctreeQuantizer. 

Program  13.2  shows  a  complete  Image J  plugin  that  employs  the  class 
MedianCutQuantizer  for  quantizing  an  RGB  full-color  image  to  an 
indexed  image.  The  choice  of  data  structures  for  the  representation 
of  color  sets  and  the  implementation  of  the  associated  set  operations 
are  essential  to  achieve  good  performance.  The  data  structures  used 
in  this  implementation  are  illustrated  in  Fig.  13.6. 

Initially,  the  set  of  all  colors  contained  in  the  original  image  (ip  of 
type  ColorProcessor)  is  computed  by  new  ColorHistogramO .  The 
result  is  an  array  imageColors  of  size  K  Each  cell  of  imageColors 
refers  to  a  colorNode  object  (c^)  that  holds  the  associated  color  (red, 
green,  blue)  and  its  frequency  (cnt)  in  the  image.  Each  colorBox 
object  (corresponding  to  a  color  box  b  in  Alg.  13.1)  selects  a  con¬ 
tiguous  range  of  image  colors,  bounded  by  the  indices  lower  and 
upper.  The  ranges  of  elements  in  imageColors,  indexed  by  differ¬ 
ent  colorBox  objects,  never  overlap.  Each  element  in  imageColors 
is  contained  in  exactly  one  colorBox;  that  is,  the  color  boxes  held 
in  colorSet  (B  in  Alg.  13.1)  form  a  partitioning  of  imageColors 
(colorSet  is  implemented  as  a  list  of  ColorBox  objects).  To  split  a 
particular  colorBox  along  a  color  dimension  d  =  Red,  Green,  or  Blue, 
the  corresponding  subrange  of  elements  in  imageColors  is  sorted 
with  the  property  red,  green,  or  blue,  respectively,  as  the  sorting 
key.  In  Java,  this  is  quite  easy  to  implement  using  the  standard 
Arrays  .  sort  ()  method  and  a  dedicated  Comparator  object  for  each 
color  dimension.  Finally,  the  method  quantize  ()  replaces  each  pixel 
in  ip  by  the  closest  color  in  colorSet. 


13.3  Exercises 

Exercise  13.1.  Simplify  the  3:3:2  quantization  given  in  Prog.  13.1 
such  that  only  a  single  bit  mask/shift  step  is  performed  for  each  color 
component. 


Package  imagingbook . pub . color . quantize. 


3 
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Fig.  13.6 

Data  structures  used  in  the 
implementation  of  the  median- 
cut  quantization  algortihm 
(class  MedianCut Quantizer). 
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Exercise  13.2.  The  median-cut  algorithm  for  color  quantization 
(Sec.  13.2.2)  is  implemented  in  the  Independent  JPEG  Group’s 4 
lib  jpeg  open  source  software  with  the  following  modification:  the 
choice  of  the  cube  to  be  split  next  depends  alternately  on  (a)  the 
number  of  contained  image  pixels  and  (b)  the  cube’s  geometric  vol¬ 
ume.  Consider  the  possible  motives  and  discuss  examples  where  this 
approach  may  offer  an  improvement  over  the  original  algorithm. 

Exercise  13.3.  The  signal-to -noise  ratio  (SNR)  is  a  common  mea¬ 
sure  for  quantifying  the  loss  of  image  quality  introduced  by  color 
quantization.  It  is  defined  as  the  ratio  between  the  average  signal 
energy  Ps ignal  and  the  average  noise  energy  Pnoise.  For  example, 
given  an  original  color  image  I  and  the  associated  quantized  image 
I' ,  this  ratio  could  be  calculated  as 


SNR  (/,/')  = 


p 

1  signal 

P  . 

-1  noise 


M—l  N-l 

E  E  I \i(u,v) 

u— 0  u=0 


M—l  N-l 

E  E  | \l(u,v)  -  I'(u,v) 

u= 0  v=0 


(13.2) 


Thus  all  deviations  between  the  original  and  the  quantized  image  are 
considered  “noise”.  The  signal-to-noise  ratio  is  usually  specified  on  a 
logarithmic  scale  with  the  unit  decibel  (dB),  that  is, 

SNRlog(I,  I')  =  10  •  log10(SNR (J,  I'))  [dB].  (13.3) 

Implement  the  calculation  of  the  SNR,  as  defined  in  Eqns.  (13.2)- 
(13.3),  for  color  images  and  compare  the  results  for  the  median-cut 
and  the  octree  algorithms  for  the  same  number  of  target  colors. 
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import  i j . ImagePlus ; 

import  ij .plugin. filter . PluglnFilter ; 
import  i j . process . ByteProcessor ; 
import  i j . process . ColorProcessor ; 
import  i j . process . ImageProcessor ; 

import  imagingbook . pub . color . quantize . ColorQuant izer ; 
import  imagingbook . pub . color . quantize . Me dianCut Quant izer ; 

public  class  Median_Cut_Quantization  implements 
PluglnFilter  { 
static  int  NC0L0RS  =  32; 


13.3  Exercises 

Prog.  13.2 

Color  quantization  by  the 
median-cut  method  (ImageJ 
plugin).  This  example  uses 
the  class  MedianCutQuantizer 
to  quantize  the  original  full- 
color  RGB  image  into  (a)  an 
indexed  color  image  (of  type 
ByteProcessor)  and  (b)  an¬ 
other  RGB  image  (of  type 
ColorProcessor).  Both  images 
are  finally  displayed. 


public  int  setup (String  arg,  ImagePlus  imp)  { 
return  D0ES_RGB  +  N0_CHANGES ; 

} 


public  void  run (ImageProcessor  ip)  { 

ColorProcessor  cp  =  ip . convertToColorProcessor () ; 
int  w  =  ip  .getWidthO  ; 
int  h  =  ip . get Height () ; 

//  create  a  quantizer: 

ColorQuantizer  q  = 

new  MedianCutQuantizer (cp ,  NC0L0RS) ; 

//  quantize  cp  to  an  indexed  image: 

ByteProcessor  idxlp  =  q. quantize (cp) ; 

(new  ImagePlus ( "Quantized  Index  Image",  idxlp) ). show() ; 
//  quantize  cp  to  an  RGB  image: 

int []  rgbPix  =  q. quantize ( (int [] )  cp . getPixels () ) ; 
ImageProcessor  rgblp  = 

new  ColorProcessor (w,  h,  rgbPix); 

(new  ImagePlus ( "Quantized  RGB  Image",  rgblp) ). show () ; 

} 

} 
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14 


Colorimetric  Color  Spaces 


In  any  application  that  requires  precise,  reproducible,  and  device¬ 
independent  presentation  of  colors,  the  use  of  calibrated  color  sys¬ 
tems  is  an  absolute  necessity.  For  example,  color  calibration  is  rou¬ 
tinely  used  throughout  the  digital  print  work  flow  but  also  in  digi¬ 
tal  film  production,  professional  photography,  image  databases,  etc. 
One  may  have  experienced  how  difficult  it  is,  for  example,  to  render 
a  good  photograph  on  a  color  laser  printer,  and  even  the  color  repro¬ 
duction  on  monitors  largely  depends  on  the  particular  manufacturer 
and  computer  system. 

All  the  color  spaces  described  in  Chapter  12,  Sec.  12.2,  somehow 
relate  to  the  physical  properties  of  some  media  device,  such  as  the 
specific  colors  of  the  phosphor  coatings  inside  a  CRT  tube  or  the 
colors  of  the  inks  used  for  printing.  To  make  colors  appear  similar 
or  even  identical  on  different  media  modalities,  we  need  a  repre¬ 
sentation  that  is  independent  of  how  a  particular  device  reproduces 
these  colors.  Color  systems  that  describe  colors  in  a  measurable, 
device-independent  fashion  are  called  colorimetric  or  calibrated ,  and 
the  field  of  color  science  is  traditionally  concerned  with  the  proper¬ 
ties  and  application  of  these  color  systems  (see,  e.g.,  [258]  or  [215]  for 
an  overview).  While  several  colorimetric  standards  exist,  we  focus 
on  the  most  widely  used  CIE  systems  in  the  remaining  part  of  this 
section. 


14.1  CIE  Color  Spaces 

The  XYZ  color  system,  developed  by  the  CIE  (Commission  Interna¬ 
tionale  d’Eclairage)1  in  the  1920s  and  standardized  in  1931,  is  the 
foundation  of  most  colorimetric  color  systems  that  are  in  use  to¬ 
day  [195,  p.  22]. 


1  International  Commission  on  Illumination  (www.cie.co.at). 
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14  Colorimetric  Color 

Spaces 


14.1.1  CIE  XYZ  Color  Space 

The  CIE  XYZ  color  scheme  was  developed  after  extensive  measure¬ 
ments  of  human  visual  perception  under  controlled  conditions.  It  is 
based  on  three  imaginary  primary  colors  X,  Y,  Z,  which  are  cho¬ 
sen  such  that  all  visible  colors  can  be  described  as  a  summation  of 
positive-only  components,  where  the  Y  component  corresponds  to 
the  perceived  lightness  or  luminosity  of  a  color.  All  visible  colors 
he  inside  a  3D  cone-shaped  region  (Fig.  14.1(a)),  which  interestingly 
enough  does  not  include  the  primary  colors  themselves. 


Fig.  14.1 

The  XYZ  color  space  is  de¬ 
fined  by  the  three  imaginary 
primary  colors  X:  Y,  Z ,  where 
the  Y  dimension  corresponds 
to  the  perceived  luminance. 
All  visible  colors  are  contained 
inside  an  open,  cone-shaped 
volume  that  originates  at  the 
black  point  S  (a),  where  E 
denotes  the  axis  of  neutral 
(gray)  colors.  The  RGB  color 
space  maps  to  the  XYZ  space 
as  a  linearly  distorted  cube 
(b).  See  also  Fig.  14.5(a). 


Some  common  color  spaces,  and  the  RGB  color  space  in  partic¬ 
ular,  conveniently  relate  to  XYZ  space  by  a  linear  coordinate  trans¬ 
formation,  as  described  in  Sec.  14.4.  Thus,  as  shown  in  Fig.  14.1(b), 
the  RGB  color  space  is  embedded  in  the  XYZ  space  as  a  distorted 
cube,  and  therefore  straight  lines  in  RGB  space  map  to  straight  lines 
in  XYZ  again.  The  CIE  XYZ  scheme  is  (similar  to  the  RGB  color 
space)  nonlinear  with  respect  to  human  visual  perception,  that  is,  a 
particular  fixed  distance  in  XYZ  is  not  perceived  as  a  uniform  color 
change  throughout  the  entire  color  space.  The  XYZ  coordinates  of 
the  RGB  color  cube  (based  on  the  primary  colors  defined  by  ITU-R 
BT.709)  are  listed  in  Table  14.1. 


14.1.2  CIE  x,y  Chromaticity 

As  mentioned,  the  luminance  in  XYZ  color  space  increases  along  the 
Y  axis,  starting  at  the  black  point  S  located  at  the  coordinate  origin 
(A  =  Y  =  Z  =  0).  The  color  hue  is  independent  of  the  luminance 
and  thus  independent  of  the  Y  value.  To  describe  the  corresponding 
“pure”  color  hues  and  saturation  in  a  convenient  manner,  the  CIE 
system  also  defines  the  three  chromaticity  values 

X  =  X  +  Y  +  Z  ’  V  =  X  +  Y  +  Z  ’  Z=  X  +  Y  +  Z’  ^14'1') 

where  (obviously)  x  +  y  +  z  =  1  and  thus  one  of  the  three  values  (e.g., 
z)  is  redundant.  Equation  (14.1)  describes  a  central  projection  from 
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0.5929 
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0.3290 

14.1  CIE  Color  Spaces 

Table  14.1 

Coordinates  of  the  RGB  color 
cube  in  CIE  XYZ  space.  The 
X ,  Y,  Z  values  refer  to  stan¬ 
dard  (ITU-R  BT.  709)  pri¬ 
maries  and  white  point  D65 
(see  Table  14.2),  x,y  denote 
the  corresponding  CIE  chro- 
maticity  coordinates. 


X,  Y,  Z  coordinates  onto  the  3D  plane 


X  +  Y  +  Z=  1, 


(14.2) 


with  the  origin  S  as  the  projection  center  (Fig.  14.2).  Thus,  for 
an  arbitrary  XYZ  color  point  A  =  (Aa,Ya,Za),  the  corresponding 
chromaticity  coordinates  a  =  (xa,yaiza)  are  found  by  intersecting 
the  line  SA  with  the  X  +  Y  +  Z  =  1  plane  (Fig.  14.2(a)).  The 
final  t,  y  coordinates  are  the  result  of  projecting  these  intersection 
points  onto  the  X/Y- plane  (Fig.  14.2(b))  by  simply  dropping  the  Z 
component  za. 

The  result  is  the  well-known  horseshoe-shaped  CIE  x,  y  chro¬ 
maticity  diagram ,  which  is  shown  in  Fig.  14.2(c).  Any  x,y  point 
in  this  diagram  defines  the  hue  and  saturation  of  a  particular  color, 
but  only  the  colors  inside  the  horseshoe  curve  are  potentially  visible. 
Obviously  an  infinite  number  of  A,  Y,  Z  colors  (with  different  lumi¬ 
nance  values)  project  to  the  same  x,  y ,  z  chromaticity  values,  and  the 
XYZ  color  coordinates  thus  cannot  be  uniquely  reconstructed  from 
given  chromaticity  values.  Additional  information  is  required.  For 
example,  it  is  common  to  specify  the  visible  colors  of  the  CIE  system 
in  the  form  Yxy ,  where  Y  is  the  original  luminance  component  of 
the  XYZ  color.  Given  a  pair  of  chromaticity  values  x,  y  (with  y  >  0) 
and  an  arbitrary  Y  value,  the  missing  A,  Z  coordinates  are  obtained 
(using  the  definitions  in  Eqn.  (14.1))  as 


Y  Y  Y 

X  =  x  ■  —  ,  Z  —  z  '  —  =  (1  —  x  —  y)  •  — . 

y  y  y 


(14.3) 


The  CIE  diagram  not  only  yields  an  intuitive  layout  of  color  hues 
but  exhibits  some  remarkable  formal  properties.  The  xy  values  along 
the  outer  horseshoe  boundary  correspond  to  monochromatic  (“spec¬ 
trally  pure”),  maximally  saturated  colors  with  wavelengths  ranging 
from  below  400 nm  (purple)  up  to  780 nm  (red).  Thus  the  position 
of  any  color  inside  the  xy  diagram  can  be  specified  with  respect  to 
any  of  the  primary  colors  at  the  boundary,  except  for  the  points  on 
the  connecting  line  (“purple  line”)  between  380  and  780  nm,  whose 
purple  hues  do  not  correspond  to  primary  colors  but  can  only  be 
generated  by  mixing  other  colors. 

The  saturation  of  colors  falls  off  continuously  toward  the  “neutral 
point”  (E)  at  the  center  of  the  horseshoe,  with  x  —  y  —  \  (or  A  = 
Y  =  Z  —  1,  respectively)  and  zero  saturation.  All  other  colorless  (i.e. , 
gray)  values  also  map  to  the  neutral  point,  just  as  any  set  of  colors 
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Fig.  14.2 

CIE  x,  y  chromaticity  diagram. 
For  an  arbitrary  XYZ  color 
point  A  =  (Xa,Ya,Za), 
the  chromaticity  values 
a.  =  (xa,ya,za)  are  obtained 
by  a  central  projection  onto 
the  3D  plane  X  T  Y  +  Z  =  1 
(a).  The  corner  points  of  the 
RGB  cube  map  to  a  triangle, 
and  its  white  point  W  maps 
to  the  (colorless)  neutral  point 
E.  The  intersection  points  are 
then  projected  onto  the  X/Y 
plane  (b)  by  simply  dropping 
the  Z  component,  which  pro¬ 
duces  the  familiar  CIE  chro¬ 
maticity  diagram  shown  in  (c). 
The  CIE  diagram  contains  all 
visible  color  tones  (hues  and 
saturations)  but  no  luminance 
information,  with  wavelengths 
in  the  range  380—780  nanome¬ 
ters.  A  particular  color  space 
is  specified  by  at  least  three 
primary  colors  (tristimulus  val¬ 
ues;  e.g.,  R,  G,  B),  which  de¬ 
fine  a  triangle  (linear  hull)  con¬ 
taining  all  representable  colors. 


with  the  same  hue  but  different  brightness  corresponds  to  a  single 
x,  y  point.  All  possible  composite  colors  he  inside  the  convex  hull 
specified  by  the  coordinates  of  the  primary  colors  of  the  CIE  diagram 
and,  in  particular,  complementary  colors  are  located  on  straight  lines 
that  run  diagonally  through  the  white  point. 


14.1.3  Standard  Illuminants 

A  central  goal  of  colorimetry  is  the  quantitative  measurement  of  col¬ 
ors  in  physical  reality,  which  strongly  depends  on  the  color  properties 
of  the  illumination.  The  CIE  system  specifies  a  number  of  standard 
illuminants  for  a  variety  of  real  and  hypothetical  light  sources,  each 
specified  by  a  spectral  radiant  power  distribution  and  the  “correlated 
color  temperature”  (expressed  in  degrees  Kelvin)  [258,  Sec.  3.3.3]. 
The  following  daylight  (D)  illuminants  are  particularly  important  for 
the  design  of  digital  color  spaces  (Table  14.2): 
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D50  emulates  the  spectrum  of  natural  (direct)  sunlight  with  an 
equivalent  color  temperature  of  approximately  5000°  K.  D50  is 
the  recommended  illuminant  for  viewing  reflective  images,  such 
as  paper  prints.  In  practice,  D50  lighting  is  commonly  imple¬ 
mented  with  fluorescent  lamps  using  multiple  phosphors  to  ap¬ 
proximate  the  specified  color  spectrum. 

D65  has  a  correlated  color  temperature  of  approximately  6500°  K 
and  is  designed  to  emulate  the  average  (indirect)  daylight  ob¬ 
served  under  an  overcast  sky  on  the  northern  hemisphere.  D65 
is  also  used  as  the  reference  white  for  emittive  devices,  such  as 
display  screens. 

The  standard  illuminants  serve  to  specify  the  ambient  viewing  light 
but  also  to  define  the  reference  white  points  in  various  color  spaces 
in  the  CIE  color  system.  For  example,  the  sRGB  standard  (see  Sec. 
14.4)  refers  to  D65  as  the  media  white  point  and  D50  as  the  ambient 
viewing  illuminant.  In  addition,  the  CIE  system  also  specifies  the 
range  of  admissible  viewing  angles  (commonly  at  ±2°). 


°K 

A 

Y 

z 

X 

y 

D50 

5000 

0.96429 

1.00000 

0.82510 

0.3457 

0.3585 

D65 

6500 

0.95045 

1.00000 

1.08905 

0.3127 

0.3290 

N 

1.00000 

1.00000 

1.00000 

0.3333 

0.3333 

14.1.4  Gamut 

The  set  of  all  colors  that  can  be  handled  by  a  certain  media  device 
or  can  be  represented  by  a  particular  color  space  is  called  “gamut”. 
This  is  usually  a  contiguous  region  in  the  3D  CIE  XYZ  color  space  or, 
reduced  to  the  representable  color  hues  and  ignoring  the  luminance 
component,  a  convex  region  in  the  2D  CIE  chromaticity  diagram. 

Figure  14.3  illustrates  some  typical  gamut  regions  inside  the  CIE 
diagram.  The  gamut  of  an  output  device  mainly  depends  on  the 
technology  employed.  For  example,  ordinary  color  monitors  are  typ¬ 
ically  not  capable  of  displaying  all  colors  of  the  gamut  covered  by 
the  corresponding  color  space  (usually  sRGB).  Conversely,  it  is  also 
possible  that  devices  would  reproduce  certain  colors  that  cannot  be 
represented  in  the  utilized  color  space.  Significant  deviations  exist, 
for  example,  between  the  RGB  color  space  and  the  gamuts  asso¬ 
ciated  with  CMYK-based  printers.  Also,  media  devices  with  very 
large  gamuts  exist,  as  demonstrated  by  the  laser  display  system  in 
Fig.  14.3.  Representing  such  large  gamuts  and,  in  particular,  trans¬ 
forming  between  different  color  representations  requires  adequately 
sized  color  spaces,  such  as  the  Adobe-RGB  color  space  or  CIELAB 
(described  in  Sec.  14.2),  which  covers  the  entire  visible  portion  of  the 
CIE  diagram. 

14.1.5  Variants  of  the  CIE  Color  Space 

The  original  CIEXYZ  color  space  and  the  derived  xy  chromaticity 
diagram  have  the  disadvantage  that  color  differences  are  not  per¬ 
ceived  equally  in  different  regions  of  the  color  space.  For  example, 


14.1  CIE  Color  Spaces 


Table  14.2 

CIE  color  parameters  for  the 
standard  illuminants  D50  and 
D65.  E  denotes  the  absolute 
neutral  point  in  CIE  XYZ 
space. 
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Fig.  14.3 

Gamut  regions  for  different 
color  spaces  and  output  de¬ 
vices  inside  the  CIE  diagram. 


large  color  changes  are  perceived  in  the  magenta  region  for  a  given 
shift  in  XYZ  while  the  change  is  relatively  small  in  the  green  region 
for  the  same  coordinate  distance.  Several  variants  of  the  CIE  color 
space  have  been  developed  for  different  purposes,  primarily  with  the 
goal  of  creating  perceptually  uniform  color  representations  without 
sacrificing  the  formal  qualities  of  the  CIE  reference  system.  Popular 
CIE-derived  color  spaces  include  CIE  YUV,  YU'V7,  YCbCr,  and  par¬ 
ticularly  CIELAB  and  CIELUV,  which  are  described  in  the  follow¬ 
ing  sections.  In  addition,  CIE-compliant  specifications  exist  for  most 
common  color  spaces  (see  Ch.  12,  Sec.  12.2),  which  allow  more  or  less 
dependable  conversions  between  almost  any  pair  of  color  spaces. 


14.2  CIELAB 

The  CIELAB  color  model  (specified  by  CIE  in  1976)  was  developed 
with  the  goal  of  linearizing  the  representation  with  respect  to  human 
color  perception  and  at  the  same  time  creating  a  more  intuitive  color 
system.  Since  then,  CIELAB2  has  become  a  popular  and  widely  used 
color  model,  particularly  for  high-quality  photographic  applications. 
It  is  used,  for  example,  inside  Adobe  Photoshop  as  the  standard 
model  for  converting  between  different  color  spaces.  The  dimensions 
in  this  color  space  are  the  luminosity  U  and  the  two  color  components 
a*,  6*,  which  specify  the  color  hue  and  saturation  along  the  green- 
red  and  blue-yellow  axes,  respectively.  All  three  components  are 
relative  values  and  refer  to  the  specified  reference  white  point  Cref  = 
(Xref,  Yref ,  Zref).  In  addition,  a  nonlinear  correction  function  (similar 
to  the  modified  gamma  correction  described  in  Ch.  4,  Sec.  4.7.6)  is 
applied  to  all  three  components,  as  will  be  detailed  further. 


14.2.1  CIEXYZ—)* CIELAB  Conversion 

Several  specifications  for  converting  to  and  from  CIELAB  space  exist 
that,  however,  differ  marginally  and  for  very  small  L  values  only.  The 


Often  CIELAB  is 


simply  referred  to  as  the  “Lab”  color  space. 
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14.2  CIELAB 


Fig.  14.4 

CIELAB  components  shown  as 
grayscale  images.  The  contrast 
of  the  a*  and  b*  images  has 
been  increased  by  40%  for 
better  viewing. 


current  specification  for  converting  between  CIEXYZ  and  CIELAB 
colors  is  defined  by  ISO  Standard  13655  [120]  as  follows: 


with 


V 

a* 

6* 


116  •  Y'  -  16, 
500-  (X'  -  Y'), 
200  •  (Y'  -  Z '), 


=/><*).  ’"=/.(>&)• z' =/><*). 


/l(c)  = 


ref 

1/3 


y 

ref 


c 


K  •  c  + 


16 

116 


for  c  >  e, 
for  c  <  e, 


and 


e 

A 


=  (-+  = 

V  29  / 

1  /  29  V 
116  V  3  7 


216 

24389 


841 

108 


0.008856, 
s  7.787. 


(14.4) 

(14.5) 

(14.6) 

(14.7) 

(14.8) 


(14.9) 

(14.10) 


For  the  conversion  in  Eqn.  (14.7),  D65  is  usually  specified  as  the 
reference  white  point  Cref  =  (Yref,  Yref,  Zref),  that  is,  Xref  =  0.95047, 
Yref  =  1-0  and  Zref  =  1.08883  (see  Table  14.2).  The  L*  values  are 
positive  and  typically  in  the  range  [0, 100]  (often  scaled  to  [0,255]), 
but  may  theoretically  be  greater.  Values  for  a*  and  5*  are  in  the  range 
—  127,  +127].  Figure  14.4  shows  the  separation  of  a  color  image  into 
the  corresponding  CIELAB  components.  Table  14.3  lists  the  relation 
between  CIELAB  and  XYZ  coordinates  for  selected  RGB  colors.  The 
given  R'G'B'  values  are  (nonlinear)  sRGB  coordinates  with  D65  as 
the  reference  white  point.3  Figure  14.5(c)  shows  the  transformation 
of  the  RGB  color  cube  into  the  CIELAB  color  space. 


14.2.2  CIELAB— >CIEXYZ  Conversion 

The  reverse  transformation  from  CIELAB  space  to  CIEXYZ  coordi¬ 
nates  is  defined  as  follows: 

X  =  X!e{  ■  f2(L’ +  £ 5),  (14.11) 

Y  =  Vef  ‘  /2  (L’) ,  (14.12) 

z=  Zref./2(C-!o),  (14.13) 

Q 

Note  that  sRGB  colors  in  Java  are  specified  with  respect  to  white  point 
D50,  which  explains  certain  numerical  deviations  (see  Sec.  14.7). 
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Table  14.3 

CIELAB  coordinates  for  se¬ 
lected  color  points  in  sRGB. 

The  sRGB  components 
R' ,  G' ,  B'  are  nonlinear  (i.e., 
gamma-corrected),  white 
point  is  D65  (see  Table  14.2). 


sRGB  CIEXYZ  (D65)  CIELAB 


Pt. 

Color 

R' 

G' 

B' 

V35 

Us 

^65 

L* 

* 

a 

b* 

s 

Black 

0.00 

0.00 

0.00 

0.0000 

0.0000 

0.0000 

0.00 

0.00 

0.00 

R 

Red 

1.00 

0.00 

0.00 

0.4125 

0.2127 

0.0193 

53.24 

80.09 

67.20 

Y 

Yellow 

1.00 

1.00 

0.00 

0.7700 

0.9278 

0.1385 

97.14 

-21.55 

94.48 

G 

Green 

0.00 

1.00 

0.00 

0.3576 

0.7152 

0.1192 

87.74 

-86.18 

83.18 

C 

Cyan 

0.00 

1.00 

1.00 

0.5380 

0.7873 

1.0694 

91.11 

-48.09 

-14.13 

B 

Blue 

0.00 

0.00 

1.00 

0.1804 

0.0722 

0.9502 

32.30 

79.19 

-107.86 

M 

Magenta 

1.00 

0.00 

1.00 

0.5929 

0.2848 

0.9696 

60.32 

98.24 

-60.83 

W 

White 

1.00 

1.00 

1.00 

0.9505 

1.0000 

1.0888 

100.00 

0.00 

0.00 

K 

50%  Gray 

0.50 

0.50 

0.50 

0.2034 

0.2140 

0.2330 

53.39 

0.00 

0.00 

H75 

75%  Red 

0.75 

0.00 

0.00 

0.2155 

0.1111 

0.0101 

39.77 

64.51 

54.13 

H50 

50%  Red 

0.50 

0.00 

0.00 

0.0883 

0.0455 

0.0041 

25.42 

47.91 

37.91 

R-25 

25%  Red 

0.25 

0.00 

0.00 

0.0210 

0.0108 

0.0010 

9.66 

29.68 

15.24 

P 

Pink 

1.00 

0.50 

0.50 

0.5276 

0.3812 

0.2482 

68.11 

48.39 

22.83 

with 


L' 

L(c) 


L*  +  16 


16/116 

Hi 


and 


for 

for 


(?  >  e, 
c3  <  e, 


(14.14) 

(14.15) 


and  e,  r  as  defined  in  Eqns.  (14.9-14.10).  The  complete  Java  code 
for  the  CIELAB^XYZ  conversion  and  the  implementation  of  the 
associated  ColorSpace  class  can  be  found  in  Progs.  14.1  and  14.2 
(pp.  363-364). 


14.3  CIELUV 

14.3.1  CIEXYZ— >CIELUV  Conversion 

The  CIELUV  component  values  L*,  r*,  r*  are  calculated  from  given 
X,  Y,  Z  color  coordinates  as  follows: 


L 


* 


u 


* 


* 


V  = 


116 -Y'  -  16, 

13  •  IT  •  ( v!  —  u're f) 
13  •  L*  •  (A  —  v'ref). 


(14.16) 

(14.17) 

(14.18) 


with  Y'  as  defined  in  Eqn.  (14.7)  (identical  to  CIELAB)  and 


u'  =  fu(X,  V,  Z\ 
v'  =  fv(X,Y,Z), 

with  the  correction  functions 


^ref  ref  5  -^ref ,  ^ref ) , 

Aef  fv  (^ref  ?  ^ref  5  ^ref  )  5 


(14.19) 


fu(X,Y,Z)  = 


0  for  X  =  0, 

X+15Y+3Z  ^0I>  X  >  0, 


0 


9  Y 


X+15Y+3Z 


for  Y  =  0, 
for  Y  >  0. 


(14.20) 
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fv{X,Y,Z) 


(14.21) 


Linear  RGB 


sRGB 


14.3  CIELUV 


Fig.  14.5 

Transformation  of  the  RGB 
color  cube  to  the  XYZ, 
CIELAB,  and  CIELUV  color 
space.  The  left  column  shows 
the  color  cube  in  linear  RGB 
space,  the  right  column  in 
nonlinear  sRGB  space.  Both 
RGB  volumes  were  uniformly 
subdivided  into  10  X  10  X  10 
cubes  of  equal  size.  In  both 
cases,  the  transformation  to 
XYZ  space  (a,  b)  yields  a  dis¬ 
torted  cube  with  straight  edges 
and  planar  faces.  Due  to  the 
linear  transformation  from 
RGB  to  XYZ,  the  subdivi¬ 
sion  of  the  RGB  cube  remains 
uniform  (a).  However,  the  non¬ 
linear  transformation  (due  to 
gamma  correction)  from  sRGB 
to  XYZ  makes  the  tessela- 
tion  strongly  nonuniform  in 
XYZ  space  (b).  Since  CIELAB 
uses  gamma  correction  as  well, 
the  transformation  of  the  lin¬ 
ear  RGB  cube  in  (c)  appears 
much  less  uniform  than  the 
nonlinear  sRGB  cube  in  (d), 
although  this  appears  to  be  the 
other  way  round  in  CIELLIV 
(e,f).  Note  that  the  RGB/s- 
RGB  color  cube  maps  to  a 
non-convex  volume  in  both  the 
CIELAB  and  the  CLIELUV 
space. 


XYZ 


(c) 


CIELAB 


(d) 


(e)  CIELUV 


(f) 


Note  that  the  checks  for  zero  X,  Y  in  Eqns.  (14.20)-(14.21)  are  not 
part  of  the  original  definitions  but  are  essential  in  any  real  implemen¬ 
tation  to  avoid  divisions  by  zero.4 


4  Remember  though  that  floating-point  values  (double,  float)  should 
never  be  strictly  tested  against  zero  but  compared  to  a  sufficiently  small 
(epsilon)  quantity  (see  Sec.  F.1.8  in  the  Appendix). 
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Table  14.4 

CIELUV  coordinates  for  se¬ 
lected  color  points  in  sRGB. 
Reference  white  point  is  D65. 

The  IT  values  are  identical 
to  CIELAB  (see  Table  14.3). 


sRGB  CIEXYZ  (D65)  CIELUV 


Pt. 

Color 

R! 

G' 

B' 

V35 

Us 

^65 

IT 

* 

u 

* 

V 

s 

Black 

0.00 

0.00 

0.00 

0.0000 

0.0000 

0.0000 

0.00 

0.00 

0.00 

R 

Red 

1.00 

0.00 

0.00 

0.4125 

0.2127 

0.0193 

53.24 

175.01 

37.75 

Y 

Yellow 

1.00 

1.00 

0.00 

0.7700 

0.9278 

0.1385 

97.14 

7.70 

106.78 

G 

Green 

0.00 

1.00 

0.00 

0.3576 

0.7152 

0.1192 

87.74 

-83.08 

107.39 

C 

Cyan 

0.00 

1.00 

1.00 

0.5380 

0.7873 

1.0694 

91.11 

-70.48 

-15.20 

B 

Blue 

0.00 

0.00 

1.00 

0.1804 

0.0722 

0.9502 

32.30 

-9.40 

-130.34 

M 

Magenta 

1.00 

0.00 

1.00 

0.5929 

0.2848 

0.9696 

60.32 

84.07 

-108.68 

W 

White 

1.00 

1.00 

1.00 

0.9505 

1.0000 

1.0888 

100.00 

0.00 

0.00 

K 

50%  Gray 

0.50 

0.50 

0.50 

0.2034 

0.2140 

0.2330 

53.39 

0.00 

0.00 

H75 

75%  Red 

0.75 

0.00 

0.00 

0.2155 

0.1111 

0.0101 

39.77 

130.73 

28.20 

H50 

50%  Red 

0.50 

0.00 

0.00 

0.0883 

0.0455 

0.0041 

25.42 

83.56 

18.02 

R-25 

25%  Red 

0.25 

0.00 

0.00 

0.0210 

0.0108 

0.0010 

9.66 

31.74 

6.85 

P 

Pink 

1.00 

0.50 

0.50 

0.5276 

0.3812 

0.2482 

68.11 

92.15 

19.88 

14.3.2  CIELUV— >-CIEXYZ  Conversion 

The  reverse  mapping  from  L* ,  u* ,  U  components  to  X,  Y,  Y  coordi¬ 
nates  is  defined  as  follows: 


Y  =  Yie{-f2(^), 

with  /2()  as  defined  in  Eqn.  (14.15),  and 


(14.22) 


X  =  Y 


9  u' 
4U’ 


Z  =  Y 


12  -  3 u'  -  20U 

4v' 


(14.23) 


with 


KK)  = 


Kef  Kef)  for  L*  =  0, 

Kef,  Kf)  +  I3TX*  '  KK)  ^  L*  >  0, 


(14.24) 


and  u're f,p(ef  as  in  Eqn.  (14.19) 


14.3.3  Measuring  Color  Differences 

Due  to  its  high  uniformity  with  respect  to  human  color  perception, 
the  CIELAB  color  space  is  a  particularly  good  choice  for  determining 
the  difference  between  colors  (the  same  holds  for  the  CIELUV  space) 
[94,  p.  57].  The  difference  between  two  color  points  cx  = 
and  c2  =  (L2,  a2,  62)  can  be  found  by  simply  measuring  the  Euclidean 
distance  in  CIELAB  or  CIELUV  space,  for  example, 


ColorDist(c1,  c2)  =  \\c1  —  c2 


\[Yi  -  L*K  +  (d*  -  a*2)2  +  K  -  K,)2 


(14.25) 

(14.26) 


14.4  Standard  RGB  (sRGB) 

CIE-based  color  spaces  such  as  CIELAB  (and  CIELUV)  are  device¬ 
independent  and  have  a  gamut  sufficiently  large  to  represent  virtually 

5  No  explicit  check  for  zero  denominators  is  required  in  Eqn.  (14.23)  since 
v'  can  be  assumed  to  be  greater  than  zero. 
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all  visible  colors  in  the  CIEXYZ  system.  However,  in  many  computer- 
based,  display-oriented  applications,  such  as  computer  graphics  or 
multimedia,  the  direct  use  of  CIE-based  color  spaces  may  be  too 
cumbersome  or  inefficient. 

sRGB  (“standard  RGB”  [119])  was  developed  (jointly  by  Hewlett- 
Packard  and  Microsoft)  with  the  goal  of  creating  a  precisely  specified 
color  space  for  these  applications,  based  on  standardized  mappings 
with  respect  to  the  colorimetric  CIEXYZ  color  space.  This  includes 
precise  specifications  of  the  three  primary  colors,  the  white  reference 
point,  ambient  lighting  conditions,  and  gamma  values.  Interestingly, 
the  sRGB  color  specification  is  the  same  as  the  one  specified  many 
years  before  for  the  European  PAL/SECAM  television  standards. 
Compared  to  CIELAB,  sRGB  exhibits  a  relatively  small  gamut  (see 
Fig.  14.3),  which,  however,  includes  most  colors  that  can  be  repro¬ 
duced  by  current  computer  and  video  monitors.  Although  sRGB  was 
not  designed  as  a  universal  color  space,  its  CIE-based  specification  at 
least  permits  more  or  less  exact  conversions  to  and  from  other  color 
spaces. 

Several  standard  image  formats,  including  EXIF  (JPEG)  and 
PNG  are  based  on  sRGB  color  data,  which  makes  sRGB  the  de  facto 
standard  for  digital  still  cameras,  color  printers,  and  other  imaging 
devices  at  the  consumer  level  [107].  sRGB  is  used  as  a  relatively 
dependable  archive  format  for  digital  images,  particularly  in  less  de¬ 
manding  applications  that  do  not  require  (or  allow)  explicit  color 
management  [225].  Thus,  in  practice,  working  with  any  RGB  color 
data  almost  always  means  dealing  with  sRGB.  It  is  thus  no  coinci¬ 
dence  that  sRGB  is  also  the  common  color  scheme  in  Java  and  is 
extensively  supported  by  the  Java  standard  API  (see  Sec.  14.7  for 
details). 

Table  14.5  lists  the  key  parameters  of  the  sRGB  color  space  (i.e. , 
the  XYZ  coordinates  for  the  primary  colors  R,  G,  B  and  the  white 
point  W  (D65)),  which  are  defined  according  to  ITU-R  BT.709  [122] 
(see  Tables  14.1  and  14.2).  Together,  these  values  permit  the  unam¬ 
biguous  mapping  of  all  other  colors  in  the  CIE  diagram. 


Pt. 

R 

G 

B 

AT)  5 

Vs 

^65 

x65 

2/65 

R 

1.0 

0.0 

0.0 

0.412453 

0.212671 

0.019334 

0.6400 

0.3300 

G 

0.0 

1.0 

0.0 

0.357580 

0.715160 

0.119193 

0.3000 

0.6000 

B 

0.0 

0.0 

1.0 

0.180423 

0.072169 

0.950227 

0.1500 

0.0600 

W 

1.0 

1.0 

1.0 

0.950456 

1.000000 

1.088754 

0.3127 

0.3290 

14.4.1  Linear  vs.  Nonlinear  Color  Components 

sRGB  is  a  nonlinear  color  space  with  respect  to  the  XYZ  coordi¬ 
nate  system,  and  it  is  important  to  carefully  distinguish  between  the 
linear  and  nonlinear  RGB  component  values.  The  nonlinear  values 
(denoted  R',G\B')  represent  the  actual  color  tuples,  the  data  val¬ 
ues  read  from  an  image  file  or  received  from  a  digital  camera.  These 
values  are  pre-corrected  with  a  fixed  Gamma  (~  2.2)  such  that  they 
can  be  easily  viewed  on  a  common  color  monitor  without  any  ad¬ 
ditional  conversion.  The  corresponding  linear  components  (denoted 


14.4  Standard  RGB 
(sRGB) 


Table  14.5 

sRGB  tristimulus  values  R,  G, 
B  with  reference  to  the  white 
point  D65  (W). 
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Fig.  14.6 

Color  transformation 
from  CIEXYZ  to  sRGB. 
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R ,  G,  B )  relate  to  the  CIEXYZ  color  space  by  a  linear  mapping  and 
can  thus  be  computed  from  X,  Y,  Y  coordinates  and  vice  versa  by 
simple  matrix  multiplication,  that  is, 


with 


Mi 


RGB 


M 


RGB 


/  3.240479 
-0.969256 
\  0.055648 

/  0.412453 
0.212671 
\  0.019334 


-1.537150 

1.875992 

-0.204043 

0.357580 

0.715160 

0.119193 


-0.498535\ 
0.041556  , 
1.057311/ 

0.180423\ 
0.072169  . 
0.950227/ 


(14.27) 


(14.28) 


(14.29) 


Notice  that  the  three  column  vectors  of  MBGB  (Eqn.  (14.29))  are 
the  coordinates  of  the  primary  colors  R,  G,  B  (tristimulus  values) 
in  XYZ  space  (cf.  Table  14.5)  and  thus 


(14.30) 


14.4.2  CIEXYZ— >>sRGB  Conversion 


To  transform  a  given  XYZ  color  to  sRGB  (Fig.  14.6),  we  first  com¬ 
pute  the  linear  R,  G,  B  values  by  multiplying  the  (X,  Y,  Z)  coordinate 
vector  with  the  matrix  ATRGB  (Eqn.  (14.28)), 


(14.31) 


Subsequently,  a  modified  gamma  correction  (see  Ch.  4,  Sec.  4.7.6) 
with  7  =  2.4  (which  corresponds  to  an  effective  gamma  value  of  ca. 
2.2)  is  applied  to  the  linear  R,  G,  B  values, 


R'  =  h(R),  G'  =  fi(G),  B'  =  h(B), 


with 


(14.32) 


/i(c) 


12.92  •  c 

1.055  •  c1/2-4  -  0.055 


for  c  <  0.0031308, 
for  c  >  0.0031308. 


(14.33) 


X 

Y 

Z 


linear 

fR\ 

gamma 

■* 

mapping 

correction 

^RGB 

\bJ 

GO 

The  resulting  sRGB  components  R' ,  G7,  B'  are  limited  to  the  interval 
[0,1]  (see  Table  14.6).  To  obtain  discrete  numbers,  the  R',G',B' 
values  are  finally  scaled  linearly  to  the  8-bit  integer  range  [0,  255]. 


sRGB  RGB 

( nonlinear )  ( linear )  CIEXYZ 
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Color 

R' 

G' 

B' 

R 

G 

B 

^65 

U35 

^65 

s 

Black 

0.00 

0.00 

0.00 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

R 

Red 

1.00 

0.00 

0.00 

1.0000 

0.0000 

0.0000 

0.4125 

0.2127 

0.0193 

Y 

Yellow 

1.00 

1.00 

0.00 

1.0000 

1.0000 

0.0000 

0.7700 

0.9278 

0.1385 

G 

Green 

0.00 

1.00 

0.00 

0.0000 

1.0000 

0.0000 

0.3576 

0.7152 

0.1192 

C 

Cyan 

0.00 

1.00 

1.00 

0.0000 

1.0000 

1.0000 

0.5380 

0.7873 

1.0694 

B 

Blue 

0.00 

0.00 

1.00 

0.0000 

0.0000 

1.0000 

0.1804 

0.0722 

0.9502 

M 

Magenta 

1.00 

0.00 

1.00 

1.0000 

0.0000 

1.0000 

0.5929 

0.2848 

0.9696 

W 

White 

1.00 

1.00 

1.00 

1.0000 

1.0000 

1.0000 

0.9505 

1.0000 

1.0888 

K 

50%  Gray 

0.50 

0.50 

0.50 

0.2140 

0.2140 

0.2140 

0.2034 

0.2140 

0.2330 

H75 

75%  Red 

0.75 

0.00 

0.00 

0.5225 

0.0000 

0.0000 

0.2155 

0.1111 

0.0101 

H50 

50%  Red 

0.50 

0.00 

0.00 

0.2140 

0.0000 

0.0000 

0.0883 

0.0455 

0.0041 

R-25 

25%  Red 

0.25 

0.00 

0.00 

0.0509 

0.0000 

0.0000 

0.0210 

0.0108 

0.0010 

P 

Pink 

1.00 

0.50 

0.50 

1.0000 

0.2140 

0.2140 

0.5276 

0.3812 

0.2482 

14.4.3  sRGB— >*CIEXYZ  Conversion 


To  calculate  the  reverse  transformation  from  sRGB  to  XYZ,  the  given 
(nonlinear)  R'G'B'  values  (in  the  range  [0, 1])  are  first  linearized  by 
inverting  the  gamma  correction  (Eqn.  (14.33)),  that  is, 


R  =  f2(R'),  G  =  f2(G'),  B  =  f2(B% 


with 


/2(c') 


/ 

C 

12.92 

(c_±  Q.Q55\2-4 
V  1.055  ) 


for  c'  <  0.04045, 
for  d  >  0.04045. 


(14.34) 


(14.35) 


Subsequently,  the  linearized  (R,  G,  B)  vector  is  transformed  to  XYZ 
coordinates  by  multiplication  with  the  inverse  of  the  matrix  A4RGB 
(Eqn.  (14.29)), 


(14.36) 


14.4.4  Calculations  with  Nonlinear  sRGB  Values 

Due  to  the  wide  use  of  sRGB  in  digital  photography,  graphics,  mul¬ 
timedia,  Internet  imaging,  etc.,  there  is  a  probability  that  a  given 
image  is  encoded  in  sRGB  colors.  If,  for  example,  a  JPEG  image  is 
opened  with  Image J  or  Java,  the  pixel  values  in  the  resulting  data 
array  are  media-oriented  (i.e.,  nonlinear  R7,  G7,  B'  components  of  the 
sRGB  color  space).  Unfortunately,  this  fact  is  often  overlooked  by 
programmers,  with  the  consequence  that  colors  are  incorrectly  ma¬ 
nipulated  and  reproduced. 

As  a  general  rule,  any  arithmetic  operation  on  color  values  should 
always  be  performed  on  the  linearized  R,  G,  B  components,  which 
are  obtained  from  the  nonlinear  R7,  G7,  B'  values  through  the  inverse 
gamma  function  /“x  (Eqn.  (14.35))  and  converted  back  again  with 
/7  (Eqn.  (14.33)). 

Example:  color  to  grayscale  conversion 

The  principle  of  converting  RGB  colors  to  grayscale  values  by  com¬ 
puting  a  weighted  sum  of  the  color  components  was  described  already 


14.4  Standard  RGB 
(sRGB) 

Table  14.6 

CIEXYZ  coordinates  for  se¬ 
lected  sRGB  colors.  The  table 
lists  the  nonlinear  R' ,  G' ,  and 
B'  components,  the  linearized 
R ,  G,  and  B  values,  and  the 
corresponding  X,  Y,  and  Z 
coordinates  (for  white  point 
D65).  The  linear  and  nonlin¬ 
ear  RGB  values  are  identical 
for  the  extremal  points  of  the 
RGB  color  cube  S,  .  .  .  ,  W  (top 
rows)  because  the  gamma  cor¬ 
rection  does  not  affect  0  and 
1  component  values.  However, 
intermediate  colors  (K,  .  .  .  ,  P, 
shaded  rows)  may  exhibit  large 
differences  between  the  non¬ 
linear  and  linear  components 
(e.g.,  compare  the  R'  and  R 
values  for  R25). 
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14  Colorimetric  Color  Chapter  12,  Sec.  12.2.1,  where  we  had  simply  ignored  the  issue  of 

Spaces  possible  nonlinear  it  ies.  As  one  may  have  guessed,  however,  the  vari¬ 
ables  R ,  G,  B ,  and  E  in  Eqn.  (12.10)  on  p.  305, 

Y  =  0.2125  •  R  +  0.7154  •  G  +  0.072  •  B  (14.37) 

implicitly  refer  to  linear  color  and  gray  values,  respectively,  and  not 
the  raw  sRGB  values!  Based  on  Eqn.  (14.37),  the  correct  grayscale 
conversion  from  raw  (nonlinear)  sRGB  components  R7,G7,B7  is 

Y'  =  h  (0.2125  •  f2(R!)  +  0.7154  •  f2(G')  +  0.0721  •  f2{B’)) ,  (14.38) 

with  /7()  and  /(T1()  as  defined  in  Eqns.  (14.33)  and  (14.35).  The 
result  ( Y 7)  is  again  a  nonlinear,  sRGB-compatible  gray  value;  that 
is,  the  sRGB  color  tuple  (E7,  E7,  E7)  should  have  the  same  perceived 
luminance  as  the  original  color  (R7,  G7,  R7). 

Note  that  setting  the  components  of  an  sRGB  color  pixel  to  three 
arbitrary  but  identical  values  E7, 

(R7,  G7,  R7)  e-  (E7,  E7,  E7) 

always  creates  a  gray  (colorless)  pixel,  despite  the  nonlinearities  of  the 
sRGB  space.  This  is  due  to  the  fact  that  the  gamma  correction  (Eqns. 
(14.33)  and  (14.35))  applies  evenly  to  all  three  color  components  and 
thus  any  three  identical  values  map  to  a  (linearized)  color  on  the 
straight  gray  line  between  the  black  point  S  and  the  white  point  W 
in  XYZ  space  (cf.  Fig.  14.1(b)). 

For  many  applications,  however,  the  following  approximation  to 
the  exact  grayscale  conversion  in  Eqn.  (14.38)  is  sufficient.  It  works 
without  converting  the  sRGB  values  (i.e.,  directly  on  the  nonlinear 
R7,G7,R7  components)  by  computing  a  linear  combination 

E7  «  w'R  •  R7  +  w'G  •  G7  +  w'B  •  R7,  (14.39) 

with  a  slightly  different  set  of  weights;  for  example,  w'R  =  0.309, 
w'G  =  0.609,  w'B  =  0.082,  as  proposed  in  [188].  The  resulting  quantity 
from  Eqn.  (14.39)  is  sometimes  called  luma  (compared  to  luminance 
in  Eqn.  (14.37)). 


14.5  Adobe  RGB 

A  distinct  weakness  of  sRGB  is  its  relatively  small  gamut,  which  is 
limited  to  the  range  of  colors  reproducible  by  ordinary  color  mon¬ 
itors.  This  causes  problems,  for  example,  in  printing,  where  larger 
gamuts  are  needed,  particularly  in  the  green  regions.  The  “Adobe 
RGB  (1998)”  [1]  color  space,  developed  by  Adobe  as  their  own  stan¬ 
dard,  is  based  on  the  same  general  concept  as  sRGB  but  exhibits  a 
significantly  larger  gamut  (Fig.  14.3),  which  extends  its  use  partic¬ 
ularly  to  print  applications.  Figure  14.7  shows  the  noted  difference 
between  the  sRGB  and  Adobe  RGB  gamuts  in  3D  CIEXYZ  color 
space. 

The  neutral  point  of  Adobe  RGB  corresponds  to  the  D65  stan¬ 
dard  (with  x  =  0.3127,  y  =  0.3290),  and  the  gamma  value  is  2.199 
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Adobe  RGB 


14.6  Chromatic 
Adaptation 


sRGB 


Fig.  14.7 

Gamuts  of  sRGB  and  Adobe 
RGB  shown  in  CIELAB  color 
space.  The  volume  of  the 
sRGB  gamut  (a)  is  signifi¬ 
cantly  smaller  than  the  Adobe 
RGB  gamut  (b),  particularly 
in  the  green  color  region.  The 
tesselation  corresponds  to  a 
uniform  subdivision  of  the 
original  RGB  cubes  (in  the 
respective  color  spaces). 


(a) 


(b) 


(compared  with  2.4  for  sRGB)  for  the  forward  correction  and  2  j~9Q 
for  the  inverse  correction,  respectively.  The  associated  file  specifica¬ 
tion  provides  for  a  number  of  different  codings  (8-  to  16-bit  integer 
and  32-bit  floating  point)  for  the  color  components.  Adobe  RGB  is 
frequently  used  in  professional  photography  as  an  alternative  to  the 
CIELAB  color  space  and  for  picture  archive  applications. 


14.6  Chromatic  Adaptation 

The  human  eye  has  the  capability  to  interpret  colors  as  being  con¬ 
stant  under  varying  viewing  conditions  and  illumination  in  particu¬ 
lar.  A  white  sheet  of  paper  appears  white  to  us  in  bright  daylight 
as  well  as  under  fluorescent  lighting,  although  the  spectral  composi¬ 
tion  of  the  light  that  enters  the  eye  is  completely  different  in  both 
situations.  The  CIE  color  system  takes  into  account  the  color  tem¬ 
perature  of  the  ambient  lighting  because  the  exact  interpretation  of 
XYZ  color  values  also  requires  knowledge  of  the  corresponding  refer¬ 
ence  white  point.  For  example,  a  color  value  (X,  T,  Z)  specified  with 
respect  to  the  D50  reference  white  point  is  generally  perceived  differ¬ 
ently  when  reproduced  by  a  D65-based  media  device,  although  the 
absolute  (i.e.,  measured)  color  is  the  same.  Thus  the  actual  meaning 
of  XYZ  values  cannot  be  known  without  knowing  the  corresponding 
white  point.  This  is  known  as  relative  colorimetry. 

If  colors  are  specified  with  respect  to  different  white  points,  for 
example  Wj  =  (XW1,  ^wi)  and  W2  =  (XW2,  ^W2?  ^2)5  they 
can  be  related  by  first  applying  a  so-called  chromatic  adaptation 
transformation  (CAT)  [114,  Ch.  34]  in  XYZ  color  space.  This  trans¬ 
formation  determines,  for  given  color  coordinates  (XllY1,Z: L)  and 
the  associated  white  point  W1?  the  new  color  coordinates  (X2,Y2, 
Z2)  relative  to  another  white  point  W2. 

14.6.1  XYZ  Scaling 

The  simplest  chromatic  adaptation  method  is  XYZ  scaling,  where 
the  individual  color  coordinates  are  individually  multiplied  by  the 
ratios  of  the  corresponding  white  point  coordinates,  that  is, 
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(14.40) 


For  example,  for  converting  colors  (X65,  Y65,  Z65)  related  to  the 

A  A  A 

white  point  D65  =  (X65,  Y65,  Z65)  to  the  corresponding  colors  for 

X  /\  ^ 

white  point  D50  =  (Y50,  Y50,  Z50),b  the  concrete  scaling  is 


Wo 

—  Xq5  • 

-Yo 

^65 

—  ^65  ' 

0.964296 

0.950456 

=  -^65 

Yso 

=  y6B- 

Yo 

Ys 

=  r65- 

1.000000 

1.000000 

=  ^65  j 

^50 

=  ^65  ' 

Yo 

Ys 

=  ^65  ‘ 

0.825105 

1.088754 

=  Yj5 

1.01456, 


0.757843 . 


(14.41) 


This  form  of  scaling  the  color  coordinates  in  XYZ  space  is  usually  not 
considered  a  good  color  adaptation  model  and  is  not  recommended 
for  high-quality  applications. 


14.6.2  Bradford  Adaptation 


The  most  common  chromatic  adaptation  models  are  based  on  scaling 
the  color  coordinates  not  directly  in  XYZ  but  in  a  “virtual”  R*G*B* 
color  space  obtained  from  the  XYZ  values  by  a  linear  transformation 


(14.42) 


where  Mqat  is  a  3x3  transformation  matrix  (defined  in  Eqn. 
(14.45)).  After  appropriate  scaling,  the  R*G*B *  coordinates  are 
transformed  back  to  XYZ,  so  the  complete  adaptation  transform  from 
color  coordinates  X1,Y1,Z1  (w.r.t.  white  point  Wx  =  (AW1,YW1, 
Zw i))  to  the  new  color  coordinates  X2,Y2,Z2  (w.r.t.  white  point 
W2  =  (XW2  ,  Yw 2,  ZW2))  takes  the  form 


where  the  diagonal  elements 


R 


W  2 


d* 

nWl 


* 

LjW2 

UW1 


dW2 

d  * 
^W1 


are  the  (constant)  ratios 


of  the  R*G*B *  values  of  the  white  points  W2,  W1?  respectively;  that 
is, 


r?* 

nwi 

/O* 

UW1 
%1 


—  M CAT  • 


—  M CAT  • 


(14.44) 

The  “Bradford”  model  [114,  p.  590]  specifies  for  Eqn.  (14.43)  the 
particular  transformation  matrix 


/  0.8951  0.2664  -0.1614\ 

Mcat  =  -0.7502  1.7135  0.0367  .  (14.45) 

\  0.0389  -0.0685  1.0296/ 
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See  Table  14.2. 
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Fig.  14.8 

Bradford  chromatic  adaptation 
from  white  point  D65  to  D50. 
The  solid  triangle  represents 
the  original  RGB  gamut  for 
white  point  D65,  with  the  pri¬ 
maries  (R,  G,  B)  located  at 
the  corner  points.  The  dashed 
triangle  is  the  corresponding 
gamut  after  chromatic  adapta¬ 
tion  to  white  point  D50. 


Inserting  ATcax  matrix  in  Eqn.  (14.43)  gives  the  complete  chromatic 
adaptation.  For  example,  the  resulting  transformation  for  converting 
from  D65-based  to  D50-based  colors  (i.e.,  =  D65,  W2  =  D50, 

as  listed  in  Table  14.2)  is 


(X  50 
^50 
\^50 


) 


^50|65  ' 


/*65 

^65 

\^65 


/  1.047884 
0.029603 
0.009235 


0.022928 

0.990437 

0.015042 


0.050149\ 

0.017059 

0.752085/ 


(14.46) 


and  conversely  from  D50-based  to  D65-based  colors  (i.e.,  W1  =  D50, 
W2  =  D65), 


/*65 

^65 

\^65 


) 


/  0.955513 
-0.028348 
\  0.012300 


0.023079 

1.009992 

0.020484 


0.063190\ 

0.021019 

1.329993/ 


(14.47) 


Figure  14.8  illustrates  the  effects  of  adaptation  from  the  D65  white 
point  to  D50  in  the  CIE  x,  y  chromaticity  diagram.  A  short  list  of 
corresponding  color  coordinates  is  given  in  Table  14.7. 

The  Bradford  model  is  a  widely  used  chromatic  adaptation  scheme 
but  several  similar  procedures  have  been  proposed  (see  also  Exercise 
14.1).  Generally  speaking,  chromatic  adaptation  and  related  prob¬ 
lems  have  a  long  history  in  color  engineering  and  are  still  active  fields 
of  scientific  research  [258,  Ch.  5,  Sec.  5.12]. 
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Table  14.7 

Bradford  chromatic  adaptation 
from  white  point  D65  to  D50 
for  selected  sRGB  colors.  The 
XYZ  coordinates  X65,  Y65, 
Z65  relate  to  the  original  white 
point  D65  (Wx).  X50,  Y50 , 
Z50  are  the  corresponding 
coordinates  for  the  new  white 
point  D50  (W2),  obtained 
with  the  Bradford  adaptation 
according  to  Eqn.  (14.46). 


sRGB  XYZ  (D65)  XYZ  (D50) 


Pt. 

Color 

R' 

G' 

B' 

Y5 

T35 

^65 

Wo 

Xo 

Wo 

s 

Black 

0.00 

0.0 

0.0 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

R 

Red 

1.00 

0.0 

0.0 

0.4125 

0.2127 

0.0193 

0.4361 

0.2225 

0.0139 

Y 

Yellow 

1.00 

1.0 

0.0 

0.7700 

0.9278 

0.1385 

0.8212 

0.9394 

0.1110 

G 

Green 

0.00 

1.0 

0.0 

0.3576 

0.7152 

0.1192 

0.3851 

0.7169 

0.0971 

C 

Cyan 

0.00 

1.0 

1.0 

0.5380 

0.7873 

1.0694 

0.5282 

0.7775 

0.8112 

B 

Blue 

0.00 

0.0 

1.0 

0.1804 

0.0722 

0.9502 

0.1431 

0.0606 

0.7141 

M 

Magenta 

1.00 

0.0 

1.0 

0.5929 

0.2848 

0.9696 

0.5792 

0.2831 

0.7280 

W 

White 

1.00 

1.0 

1.0 

0.9505 

1.0000 

1.0888 

0.9643 

1.0000 

0.8251 

K 

50%  Gray 

0.50 

0.5 

0.5 

0.2034 

0.2140 

0.2330 

0.2064 

0.2140 

0.1766 

^75 

75%  Red 

0.75 

0.0 

0.0 

0.2155 

0.1111 

0.0101 

0.2279 

0.1163 

0.0073 

R-50 

50%  Red 

0.50 

0.0 

0.0 

0.0883 

0.0455 

0.0041 

0.0933 

0.0476 

0.0030 

R-25 

25%  Red 

0.25 

0.0 

0.0 

0.0210 

0.0108 

0.0010 

0.0222 

0.0113 

0.0007 

P 

Pink 

1.00 

0.5 

0.5 

0.5276 

0.3812 

0.2482 

0.5492 

0.3889 

0.1876 

14.7  Colorimetric  Support  in  Java 

sRGB  is  the  standard  color  space  in  Java;  that  is,  the  components  of 
color  objects  and  RGB  color  images  are  gamma-corrected,  nonlinear 
R'^G'^B'  values  (see  Fig.  14.6).  The  nonlinear  R',G',B'  values  are 
related  to  the  linear  R,  G,  B  values  by  a  modified  gamma  correction, 
as  specified  by  the  sRGB  standard  (Eqns.  (14.33)  and  (14.35)). 

14.7.1  Profile  Connection  Space  (PCS) 

The  Java  API  (AWT)  provides  classes  for  representing  color  objects 
and  color  spaces,  together  with  a  rich  set  of  corresponding  methods. 
Java’s  color  system  is  designed  after  the  ICC7  “color  management 
architecture”,  which  uses  a  CIEXYZ-based  device-independent  color 
space  called  the  “profile  connection  space”  (PCS)  [118, 121].  The 
PCS  color  space  is  used  as  the  intermediate  reference  for  converting 
colors  between  different  color  spaces.  The  ICC  standard  defines  de¬ 
vice  profiles  (see  Sec.  14.7.4)  that  specify  the  transforms  to  convert 
between  a  device’s  color  space  and  the  PCS.  The  advantage  of  this 
approach  is  that  for  any  given  device  only  a  single  color  transfor¬ 
mation  (profile)  must  be  specified  to  convert  between  device-specific 
colors  and  the  unified,  colorimetric  profile  connection  space.  Every 
ColorSpace  class  (or  subclass)  provides  the  methods  fromCIEXYZO 
and  toCIEXYZO  to  convert  device  color  values  to  XYZ  coordinates 
in  the  standardized  PCS.  Figure  14.9  illustrates  the  principal  appli¬ 
cation  of  ColorSpace  objects  for  converting  colors  between  different 
color  spaces  in  Java  using  the  XYZ  space  as  a  common  “hub”. 

Different  to  the  sRGB  specification,  the  ICC  specifies  D50  (and 
not  D65)  as  the  illuminant  white  point  for  its  default  PCS  color 
space  (see  Table  14.2).  The  reason  is  that  the  ICC  standard  was 
developed  primarily  for  color  management  in  photography,  graphics, 
and  printing,  where  D50  is  normally  used  as  the  reflective  media 
white  point.  The  Java  methods  fromCIEXYZO  and  toCIEXYZO  thus 
take  and  return  Y,  Y,  Z  color  coordinates  that  are  relative  to  the  D50 
white  point.  The  resulting  coordinates  for  the  primary  colors  (listed 
in  Table  14.8)  are  different  from  the  ones  given  for  white  point  D65 
(see  Table  14.5)!  This  is  a  frequent  cause  of  confusion  since  the  sRGB 

7  International  Color  Consortium  (ICC,  www.color.org). 
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ColorSpace 


sRGB  (non-lin.) 

R’GB’ 
(D65)  ^ 


CS_sRGB 
toXYZQ  *y~ 

fromXYZQ  ^ 


RGB  (linear) 

RGB  _ f 

(D65)  « - L 


CS_LINEAR_RGB 
toXYZQ  y~ 

fromXYZQ 


y 

Profile 

Connection 

Space 

▲ 


(D50) 


L*a*b* 

L*a*b*  _ 

(D65)  ◄ - y 


Lab_ColorSpace 
toXYZQ  y~ 
fromXYZQ 


14.7  Colorimetric 
Support  in  Java 


Fig.  14.9 

XYZ-based  color  conver¬ 
sion  in  Java.  ColorSpace  ob¬ 
jects  implement  the  methods 
f romCIEXYZ  ()  and  toCIEXYZO  to 
convert  color  vectors  from  and 
to  the  CIEXYZ  color  space, 
respectively.  Colorimetric 
transformations  between  color 
spaces  can  be  accomplished  as 
a  two-step  process  via  the  XYZ 
space.  For  example,  to  convert 
from  sRGB  to  CIELAB,  the 
sRGB  color  is  first  converted 
to  XYZ  and  subsequently  from 
XYZ  to  CIELAB.  Notice  that 
Java’s  standard  XYZ  color 
space  is  based  on  the  D50 
white  point,  while  most  com¬ 
mon  color  spaces  refer  to  D65. 


Pt. 

R 

G 

B 

Vo 

Vo 

Z50 

^50 

2/50 

R 

1.0 

0.0 

0.0 

0.436108 

0.222517 

0.013931 

0.6484 

0.3309 

G 

0.0 

1.0 

0.0 

0.385120 

0.716873 

0.097099 

0.3212 

0.5978 

B 

0.0 

0.0 

1.0 

0.143064 

0.060610 

0.714075 

0.1559 

0.0660 

W 

1.0 

1.0 

1.0 

0.964296 

1.000000 

0.825106 

0.3457 

0.3585 

Table  14.8 

Color  coordinates  for  sRGB 
primaries  and  the  white  point 
in  Java’s  default  XYZ  color 
space.  Color  coordinates  for 
sRGB  primaries  and  the  white 
point  in  Java’s  default  XYZ 
color  space.  The  white  point 
W  is  equal  to  D50. 


component  values  are  D65-based  (as  specified  by  the  sRGB  standard) 
but  Java’s  XYZ  values  are  relative  to  the  D50. 


Chromatic  adaptation  (see  Sec.  14.6)  is  used  to  convert  between 
XYZ  color  coordinates  that  are  measured  with  respect  to  different 
white  points.  The  ICC  specification  [118]  recommends  a  linear  chro¬ 
matic  adaptation  based  on  the  Bradford  model  to  convert  between 
the  D65-related  XYZ  coordinates  (X65,  Y65,  Z65)  and  D50-related  val¬ 
ues  (Y50,Y50,  Z50).  This  is  also  implemented  by  the  Java  API. 

The  complete  mapping  between  the  linearized  sRGB  color  val¬ 
ues  (R,  G,  £>)  and  the  D50-based  (Y50,  Y50,  Z50)  coordinates  can  be 
expressed  as  a  linear  transformation  composed  of  the  RGB— ^XYZ65 
transformation  by  matrix  AfRGB  (Eqns.  (14.28)  and  (14.29))  and  the 
chromatic  adaptation  transformation  XYZ65— ^XYZ50  defined  by  the 
matrix  Af50j65  (Eqn.  (14.46)), 


/*50 

^50 

\^50 


) 


/ 0.436131 
0.222527 
yO. 013926 


0.385147 

0.716878 

0.097080 


0.143033\ 

0.060600 

0.713871/ 


(14.48) 


and,  in  the  reverse  direction, 
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Fig.  14.10 

Transformation  from  D50- 
based  CIEXYZ  coordinates 
(Wo>  Uo*  zso)  in  Java’s  Pro¬ 
file  Connection  Space  (PCS) 
to  nonlinear  sRGB  values 
(R' ,  G' ,  B').  The  first  step 
ist  chromatic  adaptation  from 
D50  to  D65  (by  AT65|50), 
followed  by  mapping  the  CIE¬ 
XYZ  coordinates  to  linear 
RGB  values  (by  ATRGB). 
Finally,  gamma  correction 
is  applied  individually  to 
all  three  color  components. 


(X5o\ 

^RGB'^65|50'  ^50 

vw 

/  3.133660 -1.617140 -0.490588\  f  X50\ 

-0.978808  1.916280  0.033444  •  T50  .  (14.49) 

\  0.071979-0.229051  1.405840/  \Z50J 

Equations  (14.48)  and  (14.49)  are  the  transformations  implemented 
by  the  methods  toCIEXYZO  and  f romCIEXYZ () ,  respectively,  for 
Java’s  default  sRGB  ColorSpace  class.  Of  course,  these  methods 
must  also  perform  the  necessary  gamma  correction  between  the  lin¬ 
ear  R ,  (C,  B  components  and  the  actual  (nonlinear)  sRGB  values 
R' ,  G7 ,  B' .  Figure  14.10  illustrates  the  complete  transformation  from 
D50-based  PCS  coordinates  to  nonlinear  sRGB  values. 

PCS 


R 
G 
B 

chromatic  XYZ 

adaptation  to  linear 

RGB 


CJI 

0 

V 

U5  ** 

Uo  — ► 

-^65|50 

Us  ^ 

^Jrgb 

^50  - * 

^65  ^ 

14.7.2  Color- Related  Java  Classes 

The  Java  standard  API  offers  extensive  support  for  working  with 
colors  and  color  images.  The  most  important  classes  contained  in 
the  Java  AWT  package  are: 

•  Color:  defines  individual  color  objects. 

•  ColorSpace:  specifies  entire  color  spaces. 

•  ColorModel:  describes  the  structure  of  color  images;  e.g.,  full- 
color  images  or  indexed-color  images  (see  Prog.  12.3  on  p.  301). 

Class  Color  ( j  ava . awt . Color ) 

An  object  of  class  Color  describes  a  particular  color  in  the  associated 
color  space,  which  defines  the  number  and  type  of  the  color  compo¬ 
nents.  Color  objects  are  primarily  used  for  graphic  operations,  such 
as  to  specify  the  color  for  drawing  or  filling  graphic  objects.  Un¬ 
less  the  color  space  is  not  explicitly  specified,  new  Color  objects  are 
created  as  sRGB  colors.  The  arguments  passed  to  the  Color  con¬ 
structor  methods  may  be  either  float  components  in  the  range  [0, 1] 
or  integers  in  the  range  [0,255],  as  demonstrated  by  the  following 
example: 

Color  pink  =  new  Color (1. Of,  0.5f,  0.5f); 

Color  blue  =  new  Color (0,  0,  255); 

Note  that  in  both  cases  the  arguments  are  interpreted  as  nonlinear 
sRGB  values  (Rf ,  G' ,  B').  Other  constructor  methods  exist  for  class 
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Color  that  also  accept  alpha  (transparency)  values.  In  addition, 
the  Color  class  offers  two  useful  static  methods,  RGBtoHSBO  and 
HSBtoRGBO,  for  converting  between  sRGB  and  HSV8  colors  (see  Ch. 
12,  Sec.  12.2.3). 

Class  ColorSpace  (java. awt . color . ColorSpace) 

An  object  of  type  ColorSpace  represents  an  entire  color  space,  such 
as  sRGB  or  CMYK.  Every  subclass  of  ColorSpace  (which  itself  is  an 
abstract  class)  provides  methods  for  converting  its  native  colors  to 
the  CIEXYZ  and  sRGB  color  space  and  vice  versa,  such  that  conver¬ 
sions  between  arbitrary  color  spaces  can  easily  be  performed  (through 
Java’s  XYZ-based  profile  connection  space).  In  the  following  exam¬ 
ple,  we  first  create  an  instance  of  the  default  sRGB  color  space  by 
invoking  the  static  method  ColorSpace .  getlnstance ()  and  subse¬ 
quently  convert  an  sRGB  color  object  (R7,  B',G')  to  the  correspond¬ 
ing  (X,Y,Z)  coordinates  in  Java’s  (D50-based)  profile  connection 
space: 

//  create  an  sRGB  color  space  object: 

ColorSpace  sRGBcsp 

=  ColorSpace . getlnstance (ColorSpace . CS_sRGB) ; 
f loat []  pink_RGB  =  new  f loat  []  {l.Of,  0.5f,  0.5f}; 

//  convert  from  sRGB  to  XYZ: 

float []  pink_XYZ  =  sRGBcsp . toCIEXYZ (pink_RGB) ; 

Notice  that  color  vectors  are  represented  as  float  []  arrays  for 
color  conversions  with  ColorSpace  objects.  If  required,  the  method 
getComponents  ()  can  be  used  to  convert  Color  objects  to  float  [] 
arrays.  In  summary,  the  types  of  color  spaces  that  can  be  created 
with  the  ColorSpace  .  getlnstance  ()  method  include: 

•  CS_sRGB:  the  standard  (D65-based)  RGB  color  space  with  non¬ 
linear  R\G\B'  components,  as  specified  in  [119]. 

•  CS_LINEAR_RGB:  color  space  with  linear  R,  G,  B  components  (i.e., 
no  gamma  correction  applied). 

•  CS_GRAY:  single-component  color  space  with  linear  grayscale  val¬ 
ues. 

•  CS_PYCC:  Kodak’s  Photo  YCC  color  space. 

•  CS_CIEXYZ:  the  default  XYZ  profile  connection  space  (based  on 
the  D50  white  point). 

Other  color  spaces  can  be  implemented  by  creating  additional  im¬ 
plementations  (subclasses)  of  ColorSpace,  as  demonstrated  for  CIE- 
LAB  in  the  example  in  Sec.  14.7.3. 

14.7.3  Implementation  of  the  CIELAB  Color  Space 
(Example) 

In  the  following,  we  show  a  complete  implementation  of  the  CIELAB 
color  space,  which  is  not  available  in  the  current  Java  API,  based 
on  the  specification  given  in  Sec.  14.2.  For  this  purpose,  we  define  a 

8  The  HSV  color  space  is  referred  to  as  “HSB”  (hue,  saturation,  bright¬ 
ness)  in  the  Java  API. 
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14  Colorimetric  Color  subclass  of  ColorSpace  (defined  in  the  package  java,  awt .  color) 

Spaces  named  Lab_ColorSpace,  which  implements  the  required  methods 
toCIEXYZO,  f romCIEXYZO  for  converting  to  and  from  Java’s  de¬ 
fault  profile  connection  space,  respectively,  and  toRGBO,  fromRGBO 
for  converting  between  CIELAB  and  sRGB  (Progs.  14.1  and  14.2). 
These  conversions  are  performed  in  two  steps  via  XYZ  coordinates, 
where  care  must  be  taken  regarding  the  right  choice  of  the  associ¬ 
ated  white  point  (CIELAB  is  based  on  D65  and  Java  XYZ  on  D50). 
The  following  examples  demonstrate  the  principal  use  of  the  new 
Lab_ColorSpace  class:9 

ColorSpace  labCs  =  new  LabColorSpace () ; 
f loat  []  cyan_sRGB  =  {O.Of,  l.Of,  l.Of}; 
f  loat  []  cyan_LAB  =  labCs  .  fromRGB  (cyan_sRGB)  //  sRGB— TAB 
f  loat  []  cyan_XYZ  =  labCs  .  toXYZ  (cyan_LAB)  ;  //  LAB— ^XYZ  (D50) 


14.7.4  ICC  Profiles 

Even  with  the  most  precise  specification,  a  standard  color  space  may 
not  be  sufficient  to  accurately  describe  the  transfer  characteristics 
of  some  input  or  output  device.  ICC10  profiles  are  standardized  de¬ 
scriptions  of  individual  device  transfer  properties  that  warrant  that 
an  image  or  graphics  can  be  reproduced  accurately  on  different  me¬ 
dia.  The  contents  and  the  format  of  ICC  profile  files  is  specified 
in  [118],  which  is  identical  to  ISO  standard  15076  [121].  Profiles  are 
thus  a  key  element  in  the  process  of  digital  color  management  [246] . 

The  Java  graphics  API  supports  the  use  of  ICC  profiles  mainly 
through  the  classes  ICC_ColorSpace  and  ICC_Prof  ile,  which  allow 
application  designers  to  create  various  standard  profiles  and  read  ICC 
profiles  from  data  files. 

Assume,  for  example,  that  an  image  was  recorded  with  a  cali¬ 
brated  scanner  and  shall  be  displayed  accurately  on  a  monitor.  For 
this  purpose,  we  need  the  ICC  profiles  for  the  scanner  and  the  mon¬ 
itor,  which  are  often  supplied  by  the  manufacturers  as  .icc  data 
files.11  For  standard  color  spaces,  the  associated  ICC  profiles  are  of¬ 
ten  available  as  part  of  the  computer  installation,  such  as  CIERGB .  icc 
or  NTSC1953 .  icc.  With  these  profiles,  a  color  space  object  can  be 
specified  that  converts  the  image  data  produced  by  the  scanner  into 
corresponding  CIEXYZ  or  sRGB  values,  as  illustrated  by  the  follow¬ 
ing  example: 

//  load  the  scanner’s  ICC  profile  and  create  a  corresponding  color  space: 
ICC_ColorSpace  scannerCs  =  new 

ICC_ColorSpace (ICC_Prof ileRGB . getlnstance ( "scanner . icc" ) ) ; 

//  specify  a  device-specific  color: 

f loat []  deviceColor  =  {0.77f,  0.13f,  0.89f}; 

9  Classes  LabColorSpace,  LuvColor Space  (analogous  implementation  of 
the  CIELUV  color  space)  and  associated  auxiliary  classes  are  found  in 
package  imagingbook . pub . color image. 

10  International  Color  Consortium  ICC  (www.color.org). 

11  ICC  profile  files  may  also  come  with  extensions  .  icm  or  .pf  (as  in  the 
Java  distribution). 
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1  package  imagingbook . pub . color . image ; 

2 

3  import  static  imagingbook . pub . color . image . Illuminant .D50 ; 

4  import  static  imagingbook. pub. color . image . Illuminant .D65; 

5 

6  import  java. awt . color . ColorSpace ; 

7 

8  public  class  LabColorSpace  extends  ColorSpace  { 

9 

10  //  D65  reference  white  point  and  chromatic  adaptation  objects: 

11  static  final  double  Xref  =  D65.X;  // 0.950456 

12  static  final  double  Yref  =  D65.Y;  // 1.000000 

13  static  final  double  Zref  =  D65.Z;  // 1.088754 

14 

15  static  final  ChromaticAdaptation  catD65toD50  = 

16  new  Bradf ordAdaptation (D65 ,  D50) ; 

17  static  final  ChromaticAdaptation  catD50toD65  = 

18  new  Bradf ordAdaptation (D50 ,  D65) ; 

19 

20  //  the  only  constructor: 

21  public  LabColorSpace ()  { 

22  super (TYPE_Lab ,3) ; 

23  } 

24 

25  //  XYZ  (Profile  Connection  Space,  D50)  -a  CIELab  conversion: 

26  public  float  []  fromCIEXYZ  (float  []  XYZ50)  { 

27  float []  XYZ65  =  catD50toD65 . apply (XYZ50) ; 

28  return  f romCIEXYZ65 (XYZ65) ; 

29  } 

30 

31  //  XYZ  (D65)  -A  CIELab  conversion  (Eqn.  (1 4.6)— 1 4. 1 0): 

32  public  float  []  fromCIEXYZ65  (float  []  XYZ65)  { 

33  double  xx  =  fl(XYZ65[0]  /  Xref); 

34  double  yy  =  fl(XYZ65[l]  /  Yref); 

35  double  zz  =  fl(XYZ65[2]  /  Zref); 

36  float  L  =  (float) (116 . 0  *  yy  -  16.0); 

37  float  a  =  (float) (500 . 0  *  (xx  -  yy) ) ; 

38  float  b  =  (float) (200 . 0  *  (yy  -  zz)); 

39  return  new  f  loat  []  {L,  a,  b}; 

40  } 

41  //  CIELab— )>XYZ  (Profile  Connection  Space,  D50)  conversion: 

42  public  float  []  toCIEXYZ  (float  []  Lab)  { 

43  float []  XYZ65  =  toCIEXYZ65 (Lab) ; 

44  return  catD65toD50 . apply (XYZ65) ; 

45  } 

46 

47  //  CIELab— ^XYZ  (D65)  conversion  (Eqn.  (14.1 3)— 1 4. 1 5): 

48  public  float  []  toCIEXYZ65  (float  []  Lab)  { 

49  double  11  =(  Lab  [0]  +  16.0  )/  116.0; 

50  float  Y65  =  (float)  (Yref  *  f2(ll)); 

51  float  X65  =  (float)  (Xref  *  f2(ll  +  Lab[l]  /  500.0)); 

52  float  Z65  =  (float)  (Zref  *  f2(ll  -  Lab [2]  /  200.0)); 

53  return  new  f loat  []  {X65,  Y65,  Z65}; 

54  } 


Prog.  14.1 

Java  implementation  of  the 
CIELAB  color  space  as  a 
sub-class  of  ColorSpace  (part 
1).  The  conversion  from 
D50-based  profile  connec¬ 
tion  space  XYZ  coordinates 
to  CIELAB  (Eqn.  (14.6)) 
and  back  is  implemented 
by  the  required  methods 
fromCIEXYZ ()  and  toCIEXYZ (), 
respectively.  The  auxiliary 
methods  fromCIEXYZ65()  and 
toCIEXYZ65()  are  used  for  con¬ 
verting  D65-based  XYZ  co¬ 
ordinates  (see  Eqn.  (14.6)). 
Chromatic  adaptation  from 
D50  to  D65  is  performed 
by  the  objects  catD65toD50 
and  catD50toD65  of  type 
ChromaticAdaptation.  The 
gamma  correction  functions 
fi  (Eqn.  (14.8))  and  f2  (Eqn. 
(14.15))  are  implemented  by 
the  methods  fl()  and  f2(), 
respectively  (see  Prog.  14.2). 
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Prog.  14.2 

Java  implementation  of  the 
CIELAB  color  space  as  a  sub¬ 
class  of  ColorSpace  (part  2). 
The  methods  fromRGBO  and 
toRGBO  perform  direct  con¬ 
version  between  CIELAB  and 
sRGB  via  D65-based  XYZ 
coordinates,  i.e.,  without 
conversion  to  Java’s  Profile 
Connection  Space.  Gamma 
correction  (for  mapping  be¬ 
tween  linear  RGB  and  sRGB 
component  values)  is  im¬ 
plemented  by  the  methods 
gammaFwdO  and  gammalnvO  in 
class  sRgbUtil  (not  shown). 
The  methods  flO  and  f2() 
implement  the  forward  and 
inverse  gamma  correction  of 
CIELAB  components  (see 
Eqns.  (14.6)  and  (14.13)). 


55 

//  sRGB— ^CIELab  conversion: 

56 

public  f loat  []  fromRGB  (float  []  srgb)  { 

57 

//  get  linear  rgb  components: 

58 

double  r  =  sRgbUtil . gammalnv (srgb  [0] ) ; 

59 

double  g  =  sRgbUtil . gammalnv (srgb [1] ) ; 

60 

double  b  =  sRgbUtil . gammalnv (srgb  [2] ) ; 

61 

//  convert  to  XYZ  (D65-based,  Eqn.  (14.29)): 

62 

float  X  = 

63 

(float)  (0.412453  *  r  +  0.357580  *  g  +  0.180423 

*  b)  ; 

64 

float  Y  = 

65 

(float)  (0.212671  *  r  +  0.715160  *  g  +  0.072169 

*  b)  ; 

66 

float  Z  = 

67 

(float)  (0.019334  *  r  +  0.119193  *  g  +  0.950227 

*  b)  ; 

68 

float  []  XYZ65  =  new  f loat  []  {X,  Y,  Z>; 

69 

return  fromCIEXYZ65 (XYZ65) ; 

70 

} 

71 

72 

//CIELab— ^sRGB  conversion: 

73 

public  f loat  []  toRGB  (float  []  Lab)  { 

74 

float  []  XYZ65  =  toCIEXYZ65 (Lab) ; 

75 

double  X  =  XYZ65  [0] ; 

76 

double  Y  =  XYZ65 [1] ; 

77 

double  Z  =  XYZ65  [2] ; 

78 

//  XYZ— ^RGB  (linear  components,  Eqn.  (14.28)): 

79 

double  r  =  (  3.240479*X  +  -1.537150*Y  +  -0.498535*Z) ; 

80 

double  g  =  (-0 . 969256*X  +  1.875992*Y  +  0.041556*Z); 

81 

double  b  =  (  0.055648*X  +  -0.204043*Y  +  1 . 05731 1*Z) ; 

82 

//  RGB— )>sRGB  (nonlinear  components): 

83 

float  rr  =  (float)  sRgbUtil . gammaFwd(r) ; 

84 

float  gg  =  (float)  sRgbUtil . gammaFwd(g) ; 

85 

float  bb  =  (float)  sRgbUtil . gammaFwd(b) ; 

86 

return  new  f loat []  {rr,  gg,  bb}; 

87 

} 

88 

89 

static  final  double  epsilon  =  216.0  /  24389;  //Eqn. 

(14.9) 

90 

static  final  double  kappa  =  841.0  /  108;  // Eqn.  (14.10) 

91 

92 

//  Gamma  correction  for  L*  (forward,  Eqn.  (14.8)): 

93 

double  fl  (double  c)  { 

94 

if  (c  >  epsilon)  //  0.008856 

95 

return  Math . cbrt (c) ; 

96 

elses 

97 

return  (kappa  *  c)  +  (16.0  /  116); 

98 

} 

99 

100 

//  Gamma  correction  for  L*  (inverse,  Eqn.  (14.15)): 

101 

double  f2  (double  c)  { 

102 

double  c3  =  c  *  c  *  c; 

103 

if  (c3  >  epsilon) 

104 

return  c3; 

105 

else 

106 

return  (c  -  16.0  /  116)  /  kappa; 

107 

} 

108 

109  } 

//  end  of  class  LabColorSpace 
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//  convert  to  sRGB: 

f loat []  RGBColor  =  scannerCs . toRGB (deviceColor) ; 

//  convert  to  (D50-based)  XYZ: 

f loat []  XYZColor  =  scannerCs . toCIEXYZ (deviceColor) ; 

Similarly,  we  can  calculate  the  accurate  color  values  to  be  sent  to  the 
monitor  by  creating  a  suitable  color  space  object  from  this  device’s 
ICC  profile. 


14.8  Exerc  ses 

Exercise  14.1.  For  chromatic  adaptation  (defined  in  Eqn.  (14.43)), 
transformation  matrices  other  than  the  Bradford  model  (Eqn.  (14.45)) 
have  been  proposed;  for  example,  [225], 

/  1.2694  -0.0988  -0.1706X 

M^t  =  -°-8364  1-8006  0.0357  or  (14.50) 

V  0.0297  -0.0315  1.0018/ 

/  0.7982  0.3389  -0.1371X 

=  -0.5918  1.5512  0.0406  .  (14.51) 

V  0.0008  -0.0239  0.9753/ 

Derive  the  complete  chromatic  adaptation  transformations  A450|65 
and  M65|50  for  converting  between  D65  and  D50  colors,  analogous 
to  Eqns.  (14.46)  and  (14.47),  for  each  of  the  above  transformation 
matrices. 

Exercise  14.2.  Implement  the  conversion  of  an  sRGB  color  image  to 
a  colorless  (grayscale)  sRGB  image  using  the  three  methods  in  Eqn. 
(14.37)  (incorrectly  applying  standard  weights  to  nonlinear  R'G' B' 
components),  Eqn.  (14.38)  (exact  computation),  and  Eqn.  (14.39) 
(approximation  using  nonlinear  components  and  modified  weights). 
Compare  the  results  by  computing  difference  images,  and  also  deter¬ 
mine  the  total  errors. 

Exercise  14.3.  Write  a  program  to  evaluate  the  errors  that  are  in¬ 
troduced  by  using  nonlinear  instead  of  linear  color  components  for 
grayscale  conversion.  To  do  this,  compute  the  diffence  between  the  Y 
values  obtained  with  the  linear  variant  (Eqn.  (14.38))  and  the  nonlin¬ 
ear  variant  (Eqn.  (14.39)  with  w'R  =  0.309,  w'G  =  0.609,  w'B  =  0.082) 
for  all  possible  224  RGB  colors.  Let  your  program  return  the  max¬ 
imum  gray  value  difference  and  the  sum  of  the  absolute  differences 
for  all  colors. 

Exercise  14.4.  Determine  the  virtual  primaries  R*,  G*,  B*  obtained 
by  Bradford  adaptation  (Eqn.  (14.42)),  with  MCAT  as  defined  in  Eqn. 
(14.45).  What  are  the  resulting  coordinates  in  the  xy  chromaticity 
diagram?  Are  the  primaries  inside  the  visible  color  range? 
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15 


Filters  for  Color  Images 


Color  images  are  everywhere  and  filtering  them  is  such  a  common 
task  that  it  does  not  seem  to  require  much  attention  at  all.  In  this 
chapter,  we  describe  how  classical  linear  and  nonlinear  filters,  which 
we  covered  before  in  the  context  of  grayscale  images  (see  Ch.  5),  can 
be  either  used  directly  or  adapted  for  the  processing  of  color  images. 
Often  color  images  are  treated  as  stacks  of  intensity  images  and  ex¬ 
isting  monochromatic  filters  are  simply  applied  independently  to  the 
individual  color  channels.  While  this  is  straightforward  and  performs 
satisfactorily  in  many  situations,  it  does  not  take  into  account  the 
vector-valued  nature  of  color  pixels  as  samples  taken  in  a  specific, 
multi-dimensional  color  space.  As  we  show  in  this  chapter,  the  out¬ 
come  of  filter  operations  depends  strongly  on  the  working  color  space 
and  the  variations  between  different  color  spaces  may  be  substantial. 
Although  this  may  not  be  apparent  in  many  situations,  it  should  be 
of  concern  if  high-quality  color  imaging  is  an  issue. 


15.1  Linear  Filters 

Linear  filters  are  important  in  many  applications,  such  as  smoothing, 
noise  removal,  interpolation  for  geometric  transformations,  decima¬ 
tion  in  scale-space  transformations,  image  compression,  reconstruc¬ 
tion  and  edge  enhancement.  The  general  properties  of  linear  filters 
and  their  use  on  scalar- valued  grayscale  images  are  detailed  in  Chap¬ 
ter  5,  Sec.  5.2.  For  color  images,  it  is  common  practice  to  apply  these 
monochromatic  filters  separately  to  each  color  channel,  thereby  treat¬ 
ing  the  image  as  a  stack  of  scalar- valued  images.  As  we  describe  in 
the  following  section,  this  approach  is  simple  as  well  as  efficient,  since 
existing  implementations  for  grayscale  images  can  be  reused  without 
any  modification.  However,  the  outcome  depends  strongly  on  the 
choice  of  the  color  space  in  which  the  filter  operation  is  performed. 
For  example,  it  makes  a  great  difference  if  the  channels  of  an  RGB 
image  contain  linear  or  nonlinear  component  values.  This  issue  is 
discussed  in  more  detail  in  Sec.  15.1.2. 
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15.1.1  Monochromatic  Application  of  Linear  Filters 

Given  a  discrete  scalar  (grayscale)  image  with  elements  I(u,v)  G  R, 
the  application  of  a  linear  filter  can  be  expressed  as  a  linear  2D 
convolution1 


I(u,v)  =  (I  *  H)(u,v)  =  ]T  I(u-i,v-j)-H(i,j),  (15.1) 

(iJ)£7ZH 

where  H  denotes  a  discrete  filter  kernel  defined  over  the  (usually 
rectangular)  region  1ZH ,  with  G  R.  For  a  vector- valued  image 

I  with  K  components,  the  individual  picture  elements  are  vectors, 
that  is, 


J(r,  v) 


I2(u,v) 


\IK(u,v)J 


(15.2) 


with  I(u,v)  G  MK  or  Ik(u,v)  G  M,  respectively.  In  this  case,  the 
linear  filter  operation  can  be  generalized  to 


I(u,v)  =  (I  *H)(u,v)  =  ]T  I(u-i,v-j)-H{i,j),  (15.3) 


with  the  same  scalar- valued  filter  kernel  H  as  in  Eqn.  (15.1).  Thus 
the  fcth  element  of  the  resulting  pixels, 

h(u,v)=  E  40  ~i,v  ~  j)  ■  H(i,j)  =  (4  *  H)  (u,v),  (15.4) 

(lj)C7 Zh 


is  simply  the  result  of  scalar  convolution  (Eqn.  (15.1))  applied  to  the 
corresponding  component  plane  Ik.  In  the  case  of  an  RGB  color 
image  (with  K  =  3  components),  the  filter  kernel  H  is  applied  sepa¬ 
rately  to  the  scalar- valued  R,  G,  and  B  planes  (/R,/G,/B),  that  is, 

[{r(u,v)\  /(Ir*H)(u,v)\ 

I(u,v)  =  /G(«,e  =  (JG  *H)(u,v )  .  (15.5) 

Vb(u,v)J  \(Ib*H)(u,v)J 

Figure  15.1  illustrates  how  linear  filters  for  color  images  are  typically 
implemented  by  individually  filtering  the  three  scalar-valued  color 
components. 


Linear  smoothing  filters 

Smoothing  filters  are  a  particular  class  of  linear  filters  that  are  found 
in  many  applications  and  characterized  by  positive-only  filter  coef¬ 
ficients.  Let  Cu  v  =  (ci, . . .  ,cn)  denote  the  vector  of  color  pixels 
cm  G  MA  contained  in  the  spatial  support  region  of  the  kernel  iL, 
placed  at  position  (r,  v)  in  the  original  image  /,  where  n  is  the  size 
of  H.  With  arbitrary  kernel  coefficients  H(i,j)  G  R,  the  resulting 
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Fig.  15.1 

Monochromatic  application 
of  a  linear  filter.  The  filter, 
specified  by  the  kernel  JT,  is 
applied  separately  to  each  of 
the  scalar-valued  color  chan¬ 
nels  /R,  /q  ,  /B.  Combining 
the  filtered  component  chan¬ 
nels  /R,  IG,  /B  produces  the 
filtered  color  image  I . 


color  pixel  I(u,v)  =  c  in  the  filtered  image  is  a  linear  combination 
of  the  original  colors  in  Cuv,  that  is, 


n 

c  =  w1  •  cx  +  w2  •  c2  +  •  •  •  +  wn  •  cn  =  wt  •  Ci,  (15.6) 

i—  1 

where  wrn  is  the  coefficient  in  H  that  corresponds  to  pixel  cm.  If 
the  kernel  is  normalized  (i.e.,  =  1)?  the  result 

is  an  affine  combination  of  the  original  colors.  In  case  of  a  typical 
smoothing  filter,  with  H  normalized  and  all  coefficients  H(i,j)  being 
positive ,  any  resulting  color  c  is  a  convex  combination  of  the  original 
color  vectors  c1? . . . ,  cn. 

Geometrically  this  means  that  the  mixed  color  c  is  contained 
within  the  convex  hull  of  the  contributing  colors  cx  ^  illus¬ 

trated  in  Fig.  15.2.  In  the  special  case  that  only  two  original  colors 
c1?  c2  are  involved,  the  result  c  is  located  on  the  straight  line  segment 
connecting  cx  and  c2  (Fig.  15.2(b)).2 


G 


G 


Fig.  15.2 

Convex  linear  color  mixtures. 
The  result  of  the  convex  com¬ 
bination  (mixture)  of  n  color 
vectors  C  =  {cl5  .  .  .  ,  cn}  is 
confined  to  the  convex  hull 
of  C  (a).  In  the  special  case 
of  only  two  initial  colors  c1 
and  c2,  any  mixed  color  c  is 
located  on  the  straight  line 
segment  connecting  c1  and 
c2  (b). 


2 


The  convex  hull  of  two  points  cx ,  c2  consists  of  the  straight  line  segment 
between  them. 
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Fig.  15.3 

Linear  smoothing  filter  at  a 
color  edge.  Discrete  filter  ker¬ 
nel  with  positive-only  elements 
and  support  region  7 Z  (a).  Fil¬ 
ter  kernel  positioned  over  a 
region  of  constant  color  c1  and 
over  a  color  step  edge  c1/c2, 
respectively  (b).  If  the  (nor¬ 
malized)  filter  kernel  of  extent 
7Z  is  completely  embedded 
in  a  region  of  constant  color 
(c1),  the  result  of  filtering  is 
exactly  that  same  color.  At  a 
step  edge  between  two  colors 
cl5  c2,  one  part  of  the  kernel 
(7Z1 )  covers  pixels  of  color  c1 
and  the  remaining  part  ( [1Z2 ) 
covers  pixels  of  color  c2 .  In 
this  case,  the  result  is  a  linear 
mixture  of  the  colors  cx ,  c2 , 
as  illustrated  in  Fig.  15.2(b). 


(a)  (b) 


Response  to  a  color  step  edge 

Assume,  as  a  special  case,  that  the  original  RGB  image  I  contains 
a  step  edge  separating  two  regions  of  constant  colors  cx  and  c2,  re¬ 
spectively,  as  illustrated  in  Fig.  15.3(b).  If  the  normalized  smoothing 
kernel  H  is  placed  at  some  position  (u,  v),  where  it  is  fully  supported 
by  pixels  of  identical  color  c1,  the  (trivial)  response  of  the  filter  is 


I(u,v)  =  £cx  •  H(i,j)  =  c1  ■  H(i,j)  =  cx  ■  1  =  c1.  (15.7) 

(i,j)e1ZH  (i,j)£7ZH 


Thus  the  result  at  this  position  is  the  original  color  c1.  Alternatively, 
if  the  filter  kernel  is  placed  at  some  position  on  a  color  edge  (between 
two  colors  c1,c2,  see  again  Fig.  15.3(b)),  a  subset  of  its  coefficients 
(Hi)  is  supported  by  pixels  of  color  c1?  while  the  other  coefficients 
(7£2 )  overlap  with  pixels  of  color  c2.  Since  7 Z1  U  =  H  and  the 
kernel  is  normalized,  the  resulting  color  is 


c  =  XT  '  H(i,j)  +  £c2.ihi,;)  (15.8) 

(i,j)e1Zi  2 

=  ci-^2H(i,j)  +  c2 '£#(*,.?)  (15.9) 

(i,j)e'Ri  (i,j)en  2 

" - v - x  s - v - x 

1  —  5  S 

=  Ci  •  (l  —  s)  T  c2  •  s  =  Ci  T  s  •  (c2  —  *m),  (15.10) 

for  some  s  E  [0, 1].  As  we  see,  the  resulting  color  coordinate  c  lies  on 
the  straight  line  segment  connecting  the  original  colors  c1  and  c2  in 
the  respective  color  space.  Thus,  at  a  step  edge  between  two  colors 
Ci,  c2,  the  intermediate  colors  produced  by  a  (normalized)  smoothing 
filter  are  located  on  the  straight  line  between  the  two  original  color 
coordinates.  Note  that  this  relationship  between  linear  filtering  and 
linear  color  mixtures  is  independent  of  the  particular  color  space  in 
which  the  operation  is  performed. 
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15.1.2  Color  Space  Considerations 

Since  a  linear  filter  always  yields  a  convex  linear  mixture  of  the  in¬ 
volved  colors  it  should  make  a  difference  in  which  color  space  the 


-  sRGB 

- lin.  RGB 

.  CIELUV 

- CIELAB 


filter  operation  is  performed.  For  example,  Fig.  15.4  shows  the  inter¬ 
mediate  colors  produced  by  a  smoothing  filter  being  applied  to  the 
same  blue/yellow  step  edge  but  in  different  color  spaces:  sRGB,  lin¬ 
ear  RGB,  CIELUV,  and  CIELAB.  As  we  see,  the  differences  between 
the  various  color  spaces  are  substantial.  To  obtain  dependable  and 
standardized  results  it  might  be  reasonable  to  first  transform  the  in¬ 
put  image  to  a  particular  operating  color  space,  perform  the  required 
filter  operation,  and  finally  transform  the  result  back  to  the  original 
color  space,  as  illustrated  in  Fig.  15.5. 


Obviously,  a  linear  filter  implies  certain  “metric”  properties  of  the 
underlying  color  space.  If  we  assume  that  a  certain  color  space  S A 
has  this  property,  then  this  is  also  true  for  any  color  space  SB  that  is 
related  to  £A  by  a  linear  transformation,  such  as  CIEXYZ  and  linear 
RGB  space  (see  Ch.  14,  Sec.  14.4.1).  However,  many  color  spaces  used 
in  practice  (sRGB  in  particular)  are  related  to  these  reference  color 
spaces  by  highly  nonlinear  mappings,  and  thus  significant  deviations 
can  be  expected. 

Preservation  of  brightness  (luminance) 

Apart  from  the  intermediate  colors  produced  by  interpolation,  an¬ 
other  important  (and  easily  measurable)  aspect  is  the  resulting 
change  of  brightness  or  luminance  across  the  filter  region.  In  par- 


15.1  Linear  Filters 

Fig.  15.4 

Intermediate  colors  produced 
by  linear  interpolation  between 
yellow  and  blue ,  performed  in 
different  color  spaces.  The  3D 
plot  shows  the  resulting  colors 
in  linear  RGB  space. 


Fig.  15.5 

Linear  filter  operation  per¬ 
formed  in  a  “foreign”  color 
space.  The  original  RGB  image 
JRG b  is  first  transformed  to 
CIELAB  (by  T),  where  the  lin¬ 
ear  filter  is  applied  separately 
to  the  three  channels  If ,  a* , 

If .  The  filtered  RGB  image 
IRGb  is  obtained  by  trans¬ 
forming  back  from  CIELAB  to 
RGB  (by  T_1). 
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15  Filters  for  Color  ticular  it  should  generally  hold  that  the  luminance  of  the  filtered 

Images  color  image  is  identical  to  the  result  of  filtering  only  the  (scalar) 
luminance  channel  of  the  original  image  with  the  same  kernel  H. 
Thus,  if  Lum(J)  denotes  the  luminance  of  the  original  color  image 
and  Lum(J  *H)  is  the  luminance  of  the  filtered  image,  it  should  hold 
that 


Lum(7  *  H )  =  Lum(J)  *  H.  (15.11) 

This  is  only  possible  if  Lum(-)  is  linearly  related  to  the  components  of 
the  associated  color  space,  which  is  mostly  not  the  case.  From  Eqn. 
(15.11)  we  also  see  that,  when  filtering  a  step  edge  with  colors  cx  and 
c2,  the  resulting  brightness  should  also  change  monotonically  from 
Lum(c1)  to  Lum(c2)  and,  in  particular,  none  of  the  intermediate 
brightness  values  should  fall  outside  this  range. 

Figure  15.6  shows  the  results  of  filtering  a  synthetic  test  image 
with  a  normalized  Gaussian  kernel  (of  radius  a  =  3)  in  different 
color  spaces.  Differences  are  most  notable  at  the  red-blue  and  green- 
magenta  transitions,  with  particularly  large  deviations  in  the  sRGB 
space.  The  corresponding  luminance  values  Y  (calculated  from  lin¬ 
ear  RGB  components  as  in  Eqn.  (12.35))  are  shown  in  Fig.  15.6(g-j). 
Again  conspicuous  is  the  result  for  sRGB  (Fig.  15.6(c,g)),  which  ex¬ 
hibits  transitions  at  the  red-blue ,  magenta-blue ,  and  magenta- green 
edges,  where  the  resulting  brightness  drops  below  the  original  bright¬ 
ness  of  both  contributing  colors.  Thus  Eqn.  (15.11)  is  not  satisfied 
in  this  case.  On  the  other  hand,  filtering  in  linear  RGB  space  has 
the  tendency  to  produce  too  high  brightness  values,  as  can  be  seen 
at  the  black-white  markers  in  Fig.  15.6(d, h). 

Out-of-gamut  colors 

If  we  apply  a  linear  filter  in  RGB  or  sRGB  space,  the  resulting  inter¬ 
mediate  colors  are  always  valid  RGB  colors  again  and  contained  in 
the  original  RGB  gamut  volume.  However,  transformed  to  CIELUV 
or  CIELAB,  the  set  of  possible  RGB  or  sRGB  colors  forms  a  non- 
convex  shape  (see  Ch.  14,  Fig.  14.5),  such  that  linearly  interpolated 
colors  may  fall  outside  the  RGB  gamut  volume.  Particularly  critical 
(in  both  CIELUV  and  CIELAB)  are  the  red-white ,  red-yellow ,  and 
red-magenta  transitions,  as  well  as  yellow-green  in  CIELAB,  where 
the  resulting  distances  from  the  gamut  surface  can  be  quite  large  (see 
Fig.  15.7).  During  back-transformation  to  the  original  color  space, 
such  “out-of-gamut”  colors  must  receive  special  treatment,  since  sim¬ 
ple  clipping  of  the  affected  components  may  cause  unacceptable  color 
distortions  [167]. 

Implications  and  further  reading 

Applying  a  linear  filter  to  the  individual  component  channels  of  a 
color  image  presumes  a  certain  “linearity”  of  the  underlying  color 
space.  Smoothing  Liters  implicitly  perform  additive  linear  mixing 
and  interpolation.  Despite  common  practice  (and  demonstrated  by 
the  results),  there  is  no  justification  for  performing  a  linear  filter 
operation  directly  on  gamma-mapped  sRGB  components.  However, 
contrary  to  expectation,  filtering  in  linear  RGB  does  not  yield  better 
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overall  results  either.  In  summary,  both  nonlinear  sRGB  and  linear 
RGB  color  spaces  are  unsuitable  for  linear  filtering  if  perceptually  ac¬ 
curate  results  are  desired.  Perceptually  uniform  color  spaces,  such  as 
CIELUV  and  CIELAB,  are  good  choices  for  linear  filtering  because 
of  their  metric  properties,  with  CIELUV  being  perhaps  slightly  supe¬ 
rior  when  it  comes  to  interpolation  over  large  color  distances.  When 
using  CIELUV  or  CIELAB  as  intermediate  color  spaces  for  filtering 
RGB  images,  one  must  consider  that  out-of-gamut  colors  may  be 
produced  that  must  be  handled  properly.  Thus  none  of  the  exist¬ 
ing  standard  color  spaces  is  universally  suited  or  even  “ideal”  with 
respect  to  linear  filtering. 


Fig.  15.6 

Gaussian  smoothing  performed 
in  different  color  spaces.  Syn¬ 
thetic  color  image  (a)  and 
corresponding  luminance  im¬ 
age  (b).  The  test  image  con¬ 
tains  a  horizontal  bar  with 
reduced  color  saturation  but 
the  same  luminance  as  its  sur¬ 
round,  i.e.,  it  is  invisible  in 
the  luminance  image.  Gaus¬ 
sian  filter  applied  in  different 
color  spaces:  sRGB  (c),  linear 
RGB  (d),  CIELUV  (e),  and 
CIELAB  (f).  The  bottom  row 
(g— j)  shows  the  corresponding 
luminance  (V)  images.  Note 
the  dark  bands  in  the  sRGB 
result  (b),  particularly  along 
the  color  boundaries  between 
regions  B  E,  C  D,  and  D  E, 
which  stand  out  clearly  in  the 
corresponding  luminance  im¬ 
age  (g).  Filtering  in  linear 
RGB  space  (d,  h)  gives  good 
results  between  highly  satu¬ 
rated  colors,  but  subjectively 
too  high  luminance  in  unsatu¬ 
rated  regions,  which  is  appar¬ 
ent  around  the  gray  markers. 
Results  with  CIELUV  (e,  i) 
and  CIELAB  color  spaces  (f,  j) 
appear  most  consistent  as  far 
as  the  preservation  of  lumi¬ 
nance  is  concerned. 

Fig.  15.7 

Out-of-gamut  colors  produced 
by  linear  interpolation  between 
red  and  yellow  in  “foreign” 
color  spaces.  The  graphs  in 
(a)  show  the  (linear)  R ,  G,  B 
component  values  and  the 
luminance  Y  (gray  curves) 
resulting  from  a  linear  fil¬ 
ter  between  red  and  yellow 
performed  in  different  color 
spaces.  The  graphs  show  that 
the  red  component  runs  sig¬ 
nificantly  outside  the  RGB 
gamut  for  both  CIELUV  and 
CIELAB.  In  (b)  all  pixels  with 
any  component  outside  the 
RGB  gamut  by  more  than  1% 
are  marked  white  (for  filtering 
in  CIELAB). 


The  proper  choice  of  the  working  color  space  is  relevant  not  only 
to  smoothing  filters,  but  also  to  other  types  of  filters,  such  as  linear 
interpolation  Liters  for  geometric  image  transformations,  decimation 
filters  used  in  multi-scale  techniques,  and  also  nonlinear  Liters  that 


15  Filters  for  Color  involve  averaging  colors  or  calculation  of  color  distances,  such  as  the 

Images  vector  median  filter  (see  Sec.  15.2.2).  While  complex  color  space 
transformations  in  the  context  of  filtering  (e.g.,  sRGB  aa  CIELUV) 
are  usually  avoided  for  performance  reasons,  they  should  certainly 
be  considered  when  high-quality  results  are  important. 

Although  the  issues  related  to  color  mixtures  and  interpolation 
have  been  investigated  for  some  time  (see,  e.g.,  [149,258]),  their  rele¬ 
vance  to  image  filtering  has  not  received  much  attention  in  the  liter¬ 
ature.  Most  image  processing  tools  (including  commercial  software) 
apply  linear  filters  directly  to  color  images,  without  proper  lineariza¬ 
tion  or  color  space  conversion.  Lindbloom  [149]  was  among  the  first 
to  describe  the  problem  of  accurate  color  reproduction,  particularly 
in  the  context  of  computer  graphics  and  photo-realistic  imagery.  He 
also  emphasized  the  relevance  of  perceptual  uniformity  for  color  pro¬ 
cessing  and  recommended  the  use  of  CIELUV  as  a  suitable  (albeit  not 
perfect)  processing  space.  Tomasi  and  Manduchi  [229]  suggested  the 
use  of  the  Euclidean  distance  in  CIELAB  space  as  “most  natural”  for 
bilateral  filtering  applied  to  color  images  (see  also  Ch.  17,  Sec.  17.2) 
and  similar  arguments  are  put  forth  in  [109].  De  Weijer  [239]  notes 
that  the  additional  chromaticities  introduced  by  linear  smoothing  are 
“visually  unacceptable”  and  argues  for  the  use  of  nonlinear  operators 
as  an  alternative.  Lukac  et  al.  [156]  mention  “certain  inaccuracies” 
and  color  artifacts  related  to  the  application  of  scalar  filters  and  dis¬ 
cuss  the  issue  of  choosing  a  proper  distance  metric  for  vector-based 
filters.  The  practical  use  of  alternative  color  spaces  for  image  filtering 
is  described  in  [141,  Ch.  5]. 


15.1.3  Linear  Filtering  with  Circular  Values 


If  any  of  the  color  components  is  a  circular  quantity,  such  as  the 
hue  component  in  the  HSV  and  HLS  color  spaces  (see  Ch.  12,  Sec. 
12.2.3),  linear  filters  cannot  be  applied  directly  without  additional 
provisions.  As  described  in  the  previous  section,  a  linear  Liter  effec¬ 
tively  calculates  a  weighted  average  over  the  values  inside  the  filter 
region.  Since  the  hue  component  represents  a  revolving  angle  and  ex¬ 
hibits  a  discontinuity  at  the  1—^0  (i.e. ,  360°  -A  0°)  transition,  simply 
averaging  this  quantity  is  not  admissible  (see  Fig.  15.8). 

However,  correct  interpolation  of  angular  data  is  possible  by  uti¬ 
lizing  the  corresponding  cosine  and  sine  values,  without  any  special 
treatment  of  discontinuities  [69].  Given  two  angles  cq,  a2,  the  average 
angle  oq2  can  be  calculated  as3 


a 


12 


=  tan 


sin(oq)  +  sin(aq)  \ 
cos(oq)  +  cos(a2)  / 

ArcTan(cos(oq)  +  cos(a2),  sin(oq)  +  sin(a2)) 


(15.12) 

(15.13) 


and,  in  general,  multiple  angular  values  aq, . . . ,  an  can  be  correctly 
averaged  in  the  form 


n  n 

a  =  ArcTan  f  cos(ai ) ,  E  sin(cq)^.  (15.14) 

i— 1  i= 1 


See  Sec.  A.l  in  the  Appendix  for  the  definition  of  the  ArcTanQ  function. 
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Fig.  15.8 

Naive  linear  filtering  in  HSV 
color  space.  Original  RGB 
color  image  (a)  and  the  asso¬ 
ciated  HSV  hue  component 
Ih  (b),  with  values  in  the 
range  [0,  1).  Hue  component 
after  direct  application  of  a 
Gaussian  blur  filter  H  with 
a  =  3.0  (c).  Reconstructed 
RGB  image  I  after  filtering  all 
components  in  HSV  space  (d). 
Note  the  false  colors  intro¬ 
duced  around  the  0  — >•  1  dis¬ 
continuity  (near  red)  of  the 
hue  component. 


Also,  the  calculation  of  a  weighted  average  is  possible  in  the  same 
way,  that  is, 


n  n 

a  =  ArcTan(y^  w,}  •  cos(aJ,  wi  •  sin(cti  )).  (15.15) 

i— 1  i— 1 

without  any  additional  provisions,  even  the  weights  need  not  be 
normalized.  This  approach  can  be  used  for  linearly  filtering  circular 
data  in  general. 

Filtering  the  hue  component  in  HSV  color  space 

To  apply  a  linear  filter  H  to  the  circular  hue  component  4  (with 
original  values  in  [0, 1))  of  a  HSV  or  HLS  image  (see  Ch.  12,  Sec. 
12.2.3),  we  first  calculate  the  corresponding  cosine  and  sine  parts 
4m  and  4°s  by 


I^n(u,  v )  =  sin(27r  •  /h(u,  u)), 
/pos(u,  v)  =  cos(27t  •  44?  u)), 


(15.16) 


with  resulting  values  in  the  range 
individually,  that  is, 


These  are  then  filtered 


rsin  rsin  ,  tt 

4  =4 

4os  =  4os  *  h. 

Finally,  the  filtered  hue  component  4  is  obtained  in  the  form 

4(r,u)  =  —  •  [ArcTan(4°s(ib  v),  v))  mod  24  ,  (15.18) 

2?r  l  v  7  j 

with  values  again  in  the  range  [0, 1]. 

Fig.  15.9  demonstrates  the  correct  application  of  a  Gaussian 
smoothing  filter  to  the  hue  component  of  an  HSV  color  image  by 


(15.17) 
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Fig.  15.9 

Correct  filtering  of  the  HSV 
hue  component  by  separation 
into  cosine  and  sine  parts  (see 
Fig.  15.8(a)  for  the  original 
image).  Cosine  and  sine  parts 
7jin,  7£os  of  the  hue  compo¬ 
nent  before  (a,  b)  and  after 
the  application  of  a  Gaus¬ 
sian  blur  filter  with  cr  —  3.0 
(c,d).  Smoothed  hue  com¬ 
ponent  7h  after  merging  the 
filtered  cosine  and  sine  parts 
/£m,  7£os  (e).  Reconstructed 
RGB  image  I  after  filtering 
all  HSV  components  (f).  It 
is  apparent  that  the  hard 
0/1  hue  transitions  in  (e) 
are  in  fact  only  gradual  color 
changes  around  the  red  hues. 

The  other  HSV  components 
(S,  V ,  which  are  non-circular) 
were  filtered  in  the  usual  way. 
The  reconstructed  RGB  im¬ 
age  (f)  shows  no  false  colors 
and  all  hues  correctly  filtered. 


separation  into  cosine  and  sine  parts.  The  other  two  HSV  compo¬ 
nents  (5,  V )  are  non-circular  and  were  filtered  as  usual.  In  contrast 
to  the  result  in  Fig.  15.8(d),  no  false  colors  are  produced  at  the  0—^1 
boundary.  In  this  context  it  is  helpful  to  look  at  the  distribution  of 
the  hue  values,  which  are  clustered  around  0/1  in  the  sample  image 
(see  Fig.  15.10(a)).  In  Fig.  15.10(b)  we  can  clearly  see  how  naive  fil¬ 
tering  of  the  hue  component  produces  new  (false)  colors  in  the  middle 
of  the  histogram.  This  does  not  occur  when  the  hue  component  is 
filtered  correctly  (see  Fig.  15.10(c)). 

Saturation- weighted  filtering 

The  method  just  described  does  not  take  into  account  that  in  HSV 
(and  HLS)  the  hue  and  saturation  components  are  closely  related.  In 
particular,  the  hue  angle  may  be  very  inaccurate  (or  even  indetermi¬ 
nate)  if  the  associated  saturation  value  goes  to  zero.  For  example, 
the  test  image  in  Fig.  15.8(a)  contains  a  bright  patch  in  the  lower 
right-hand  corner,  where  the  saturation  is  low  and  the  hue  value  is 
quite  unstable,  as  seen  in  Fig.  15.9(a,b).  However,  the  circular  filter 
defined  in  Eqns.  (15.16)— (15. 18)  takes  all  color  samples  as  equally 
significant. 

A  simple  solution  is  to  use  the  saturation  value  7s(r,  v)  as  a  weight 
factor  for  the  associated  pixel  [98],  by  modifying  Eqn.  (15.16)  to 
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5: 

6: 

7: 

8: 

9 

10 

11 

12 


HsvLinearFilter(/hsv,  H) 

Input:  Jhsv  =  (7h,  7S,  7V),  a  HSV  color  image  of  size  M  x  TV,  with 
all  components  in  [0,1];  77,  a  2D  filter  kernel.  Returns  a  new 
(filtered)  HSV  color  image  of  size  M  x  N. 


(M,  IV)  <—  Siz e(/hsv) 

Create  2D  maps  7£in,  7£os,  7h :  M  x  TV  ^  R 

Split  the  /me  channel  into  sine/cosine  parts: 
for  all  (u,v)  £  M  x  V  do 

0  ^  2tv  '  Ih(u,  v )  >  hue  angle  0  £  [0,  2tt] 

s  £-  7s(r,  a)  >  saturation  s  £  [0, 1] 


7£m  (r,  v)  <—  s  •  sin(0) 


>7?>,*)G[-1,1] 


Ih°S(Ui  v)  s  '  cos(6>) 


>0,4  G 


Filter  all  components  with  the  same  kernel: 
7jfn  <-  7£in  *  H 

{hs  <-  4°s  *  H 
7S  <-  7S  *  77 
7V  <-  7V  *  77 


Reassemble  the  hltered  hue  channel: 

13:  for  all  (r,  a)  £  M  x  /V  do 

14:  0  <—  ArcTan(7£os(i£,  a),  I^n(u,  a))  >  #  £  [— tt,  7t] 

15:  7h(a,  a)  <—  ^  •  (0  mod  2ti)  c>  7h(R,  a)  £  [0, 1] 


16.  -^hsv  ^  (4,4,4) 

17:  return  7hsv 


15.1  Linear  Filters 

Alg.  15.1 

Linear  filtering  in  HSV  color 
space.  All  component  values  of 
the  original  HSV  image  are  in 
the  range  [0,  1].  The  algorithm 
considers  the  circular  nature 
of  the  hue  component  and  uses 
the  saturation  component  (in 
line  6)  as  a  weight  factor,  as 
defined  in  Eqn.  (15.19).  The 
same  filter  kernel  H  is  applied 
to  all  three  color  components 
(lines  9—12). 


Ihn(u , v )  =  4  V  v)  ■  sin(27r  •  Ib(u,  v)), 
7£os(r,  v)  =  7S (r,  v)  •  cos(27t  •  7h(rq  a)). 


(15.19) 


Fig.  15.10 

Histogram  of  the  HSV  hue 
component  before  and  after 
linear  filtering.  Original  dis¬ 
tribution  of  hue  values  Ih  (a), 
showing  that  colors  are  clus¬ 
tered  around  the  0/1  discon¬ 
tinuity  (red).  Result  after 
naive  filtering  the  hue  compo¬ 
nent  (b),  after  filtering  sepa¬ 
rated  cosine  and  sine  parts  (c), 
and  after  addition  weighting 
with  saturation  values  (d). 

The  bottom  row  shows  the 
isolated  hue  component  (color 
angle)  by  the  corresponding 
colors  (saturation  and  value  set 
to  100%).  Note  the  noisy  spot 
in  the  lower  right-hand  corner 
of  (a),  where  color  saturation 
is  low  and  hue  angles  are  very 
unstable. 


All  other  steps  in  Eqns.  (15.17) — (15.18)  remain  unchanged.  The  com¬ 
plete  process  is  summarized  in  Alg.  15.1.  The  result  in  Fig.  15.10(d) 
shows  that,  particularly  in  regions  of  low  color  saturation,  more  sta¬ 
ble  hue  values  can  be  expected.  Note  that  no  normalization  of  the 
weights  is  required  because  the  calculation  of  the  hue  angles  (with 
the  ArcTan()  function  in  Eqn.  (15.18))  only  considers  the  ratio  of  the 
resulting  sine  and  cosine  parts. 
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In  many  practical  image  processing  applications,  linear  filters  are  of 
limited  use  and  nonlinear  filters,  such  as  the  median  filter,  are  applied 
instead.4  In  particular,  for  effective  noise  removal,  nonlinear  filters 
are  usually  the  better  choice.  However,  as  with  linear  filters,  the 
techniques  originally  developed  for  scalar  (grayscale)  images  do  not 
transfer  seamlessly  to  vector-based  color  data.  One  reason  is  that, 
unlike  in  scalar  data,  no  natural  ordering  relation  exists  for  multi¬ 
dimensional  data.  As  a  consequence,  nonlinear  filters  of  the  scalar 
type  are  often  applied  separately  to  the  individual  color  channels, 
and  again  one  must  be  cautious  about  the  intermediate  colors  being 
introduced  by  these  types  of  filters. 

In  the  remainder  of  this  section  we  describe  the  application  of  the 
classic  (scalar)  median  filter  to  color  images,  a  vector-based  version 
of  the  median  filter,  and  edge-preserving  smoothing  filters  designed 
for  color  images.  Additional  filters  for  color  images  are  presented  in 
Chapter  IT. 


15.2.1  Scalar  Median  Filter 

Applying  a  median  filter  with  support  region  1Z  (e.g.,  a  disk-shaped 
region)  at  some  image  position  (tq  v)  means  to  select  one  pixel  value 
that  is  the  most  representative  of  the  pixels  in  7 Z  to  replace  the  cur¬ 
rent  center  pixel  (hot  spot).  In  case  of  a  median  filter,  the  statistical 
median  of  the  pixels  in  1Z  is  taken  as  that  representative.  Since  we 
always  select  the  value  of  one  of  the  existing  image  pixels,  the  median 
filter  does  not  introduce  any  new  pixel  values  that  were  not  contained 
in  the  original  image. 

If  a  median  filter  is  applied  independently  to  the  components  of 
a  color  image,  each  channel  is  treated  as  a  scalar  image,  like  a  single 
grayscale  image.  In  this  case,  with  the  support  region  1Z  centered 
at  some  point  (iq  v),  the  median  for  each  color  channel  will  typically 
originate  from  a  different  spatial  position  in  7Z ,  as  illustrated  in  Fig. 
15.11.  Thus  the  components  of  the  resulting  color  vector  are  generally 
collected  from  more  than  one  pixel  in  7£,  therefore  the  color  placed 
in  the  filtered  image  may  not  match  any  of  the  original  colors  and 
new  colors  may  be  generated  that  were  not  contained  in  the  original 
image.  Despite  its  obvious  deficiencies,  the  scalar  (monochromatic) 
median  filter  is  used  in  many  popular  image  processing  environments 
(including  Photoshop  and  Image J)  as  the  standard  median  filter  for 
color  images. 


15.2.2  Vector  Median  Filter 


The  scalar  median  filter  is  based  on  the  concept  of  rank  ordering ,  that 
is,  it  assumes  that  the  underlying  data  can  be  ordered  and  sorted. 
However,  no  such  natural  ordering  exists  for  data  elements  that  are 
vectors.  Although  vectors  can  be  sorted  in  many  different  ways,  for 
example  by  length  or  lexicographically  along  their  dimensions,  it  is 


See  also  Chapter  5,  Sec.  5.4. 
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Fig.  15.11 

Scalar  median  filter  applied 
separately  to  color  channels. 
With  the  filter  region  1Z  cen¬ 
tered  at  some  point  (u,n),  the 
median  pixel  value  is  generally 
found  at  different  locations  in 
the  R ,  G ,  B  channels  of  the 
original  image.  The  compo¬ 
nents  of  the  resulting  RGB 
color  vector  are  collected  from 
spatially  separated  pixels.  It 
thus  may  not  match  any  of  the 
colors  in  the  original  image. 


usually  impossible  to  define  a  useful  greater-than  relation  between 
any  pair  of  vectors. 

One  can  show,  however,  that  the  median  of  a  sequence  of  n  scalar 
values  P  =  (pi,  •  •  •  ,pn)  can  also  be  defined  as  the  value  pm  selected 
from  P,  such  that 


n 


^  ^  I  Pm 
i—  1 


n 


Pi 


< 


E 

i— 1 


P 


J 


(15.20) 


holds  for  any  p3  E  P.  In  other  words,  the  median  value  pm  = 
median (P)  is  the  one  for  which  the  sum  of  the  differences  to  all 
other  elements  in  the  sequence  P  is  the  smallest. 

With  this  definition,  the  concept  of  the  median  can  be  easily 
extended  from  the  scalar  situation  to  the  case  of  multi-dimensional 
data.  Given  a  sequence  of  vector- valued  samples  P  =  (p1? . . .  ,pn), 
with  pi  E  Mk,  we  define  the  median  element  pm  to  satisfy 


n 


^  ^  llPm 

i= 1 


(15.21) 


for  every  possible  p3  E  P.  This  is  analogous  to  Eqn.  (15.20),  with 
the  exception  that  the  scalar  difference  |*|  has  been  replaced  by  the 
vector  norm  ||-||  for  measuring  the  distance  between  two  points  in  the 
iF-dimensional  space.5  We  call 


DL(p,P) 


Elb 

Pi£p 


(15.22) 


the  “aggregate  distance”  of  the  sample  vector  p  with  respect  to  all 
samples  pi  in  P  under  the  distance  norm  L.  Common  choices  for  the 
distance  norm  are  the  L1?  L2  and  norms,  that  is, 

5  K  denotes  the  dimensionality  of  the  samples  in  pi:  for  example,  K  =  3 
for  RGB  color  samples. 
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Fig.  15.12 

Noisy  test  image  (a)  with 
enlarged  details  (b,c),  used 
in  the  following  examples. 


K 


W-  Up -glli  =^2\Pk-Qk 


T 

-LJoo  * 


k= 1 
K 


P-Q  ll2  =  l^lPk-Qk 


p-q 


OO 


k= 1 

max 

l<k<K 


1/2 


Pk  Qk 


(15.23) 


(15.24) 

(15.25) 


The  vector  median  of  the  sequence  P  can  thus  be  defined  as 


median(P)  =  argmin  DL(p,  P),  (15.26) 

pgp 

that  is,  the  sample  p  with  the  smallest  aggregate  distance  to  all  other 
elements  in  P. 

A  straight  forward  implementation  of  the  vector  median  filter  for 
RGB  images  is  given  in  Alg.  15.2.  The  calculation  of  the  aggregate 
distance  DL(p,  P)  is  performed  by  the  function  AggregateDistance 
(p,  jP).  At  any  position  (n,u),  the  center  pixel  is  replaced  by  the 
neighborhood  pixel  with  the  smallest  aggregate  distance  Dm in,  but 
only  if  it  is  smaller  than  the  center  pixel’s  aggregate  distance  Dctr 
(line  15).  Otherwise,  the  center  pixel  is  left  unchanged  (line  17).  This 
is  to  prevent  that  the  center  pixel  is  unnecessarily  changed  to  another 
color,  which  incidentally  has  the  same  aggregate  distance. 

The  optimal  choice  of  the  norm  L  for  calculating  the  distances 
between  color  vectors  in  Eqn.  (15.22)  depends  on  the  assumed  noise 
distribution  of  the  underlying  signal  [10].  The  effects  of  using  dif¬ 
ferent  norms  (L1?  L2,  L^)  are  shown  in  Fig.  15.13  (see  Fig.  15.12 
for  the  original  images).  Although  the  results  for  these  norms  show 
numerical  differences,  they  are  hardly  noticeable  in  real  images  (par¬ 
ticularly  in  print).  Unless  otherwise  noted,  the  Lx  norm  is  used  in 
all  subsequent  examples. 

Results  of  the  scalar  median  filter  and  the  vector  median  filter 
are  compared  in  Fig.  15.14.  Note  how  new  colors  are  introduced  by 
the  scalar  filter  at  certain  locations  (Fig.  15.14(a,c)),  as  illustrated  in 
Fig.  15.11.  In  contrast,  the  vector  median  filter  (Fig.  15.14(b,d))  can 
only  produce  colors  that  already  exist  in  the  original  image.  Figure 


380 


1: 


2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 


VectorMedianFilter(I,  r) 

Input:  I  =  (/R  >  Ao  ,  /B),  a  color  image  of  size  M  x  TV; 
r,  filter  radius  (r  >  1). 

Returns  a  new  (filtered)  color  image  of  size  M  x  N. 


(M,  N)  <r-  Siz e(/) 

I'  Duplicate(J) 

for  all  image  coordinates  (u,v)  G  M  x  N  do 

pctr  V-  I(u,v )  t>  center  pixel  of  support  region 

P  <—  GetSupportRegion(/,  u,  v,  r) 
dc tr  V-  AggregateDistance(pctr,P) 

^min  ^  00 

for  all  p  £  P  do 

d  <—  AggregateDistance(p,  P) 

if  d  <\  c/min  then 

Pmin  P 
dTT11-T1  4 —  d 

then 


if  d 


mm 


"min 

<  d 


ctr 


C(w,i>)  pmin 

else 

/'(n,  v)  I(u,  V ) 

return  I' 


>  modify  this  pixel 
>  keep  the  original  pixel  value 


19:  GetSupportRegion(J,  u,  v,  r) 

Returns  a  vector  of  n  pixel  values  P  =  (p1,p2, . . .  ,pn)  from 
image  /  that  are  inside  a  disk  of  radius  r,  centered  at  position 
(u,  v). 

20:  P<-() 

21:  for  i  \u  —  rj, . .  . ,  |dx+r]  do 

22:  for  j  — rj, . . . ,  \v+r~\  do 

23:  if  (u  —  i)2  +  (v  —  j)2  <  r2  then 

24:  p<-I(i,j) 

25:  P  <-  P  -  (p) 

26:  return  P  >  P  =  (pl5 P2,  •  •  • , pj 

27:  AggregateDistance(p,  P) 

Returns  the  aggregate  distance  DL(p,  P)  of  the  sample  vector  p 
over  all  elements  pi  £  P  (see  Eq.  15.22). 

28:  d  <-  0 

29:  for  all  q  £  P  do 

30:  d+||p  —  qr||L  t>  choose  any  distance  norm  L 

31:  return  d 


15.2  Nonlinear  Color 
Filters 

Alg.  15.2 

Vector  median  filter  for  color 
images. 


15.15  shows  the  results  of  applying  the  vector  median  filter  to  real 
color  images  while  varying  the  filter  radius. 

Since  the  vector  median  filter  relies  on  measuring  the  distance 
between  pairs  of  colors,  the  considerations  in  Sec.  15.1.2  regarding 
the  metric  properties  of  the  color  space  do  apply  here  as  well.  It  is 
thus  not  uncommon  to  perform  this  filter  operation  in  a  perceptual 
uniform  color  space,  such  as  CIELUV  or  CIELAB,  rather  than  in 
RGB  [132,240,254]. 

The  vector  median  filter  is  computationally  expensive.  Calculat¬ 
ing  the  aggregate  distance  for  all  sample  vectors  pi  in  P  requires 
0(n2)  steps,  for  a  support  region  of  size  n.  Finding  the  candidate 
neighborhood  pixel  with  the  minimum  aggregate  distance  in  P  can 
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Fig.  15.13 

Results  of  vector  median  fil¬ 
tering  using  different  color 
distance  norms:  Lx  norm 
(a),  L2  norm  (b),  L^  norm 
(c).  Filter  radius  r  =  2.0. 


Fig.  15.14 

Scalar  median  vs.  vector  me¬ 
dian  filter  applied  to  a  color 
test  image,  with  filter  radius 
r  =  2.0  (a,  b)  and  r  =  5.0 
(c,d).  Note  how  the  scalar 
median  filter  (a,  c)  introduces 
new  colors  that  are  not  con¬ 
tained  in  the  original  image. 
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be  done  in  0(n).  Since  n  is  proportional  to  the  square  of  the  filter 
radius  r,  the  number  of  steps  required  for  calculating  a  single  im¬ 
age  pixel  is  roughly  0(r4).  While  faster  implementations  have  been 
proposed  [10,18,221],  calculating  the  vector  median  filter  remains 
computationally  demanding. 

15.2.3  Sharpening  Vector  Median  Filter 

Although  the  vector  median  filter  is  a  good  solution  for  suppressing 
impulse  noise  and  additive  Gaussian  noise  in  color  images,  it  does 
tend  to  blur  or  even  eliminate  relevant  structures,  such  as  lines  and 
edges.  The  sharpening  vector  median  filter,  proposed  in  [155],  aims 
at  improving  the  edge  preservation  properties  of  the  standard  vec¬ 
tor  median  filter  described  earlier.  The  key  idea  is  not  to  calculate 
the  aggregate  distances  against  all  other  samples  in  the  neighbor¬ 
hood  but  only  against  the  most  similar  ones.  The  rationale  is  that 


the  samples  deviating  strongly  from  their  neighbors  tend  to  be  out¬ 
liers  (e.g.,  caused  by  nearby  edges)  and  should  be  excluded  from  the 
median  calculation  to  avoid  blurring  of  structural  details. 

The  operation  of  the  sharpening  vector  median  filter  is  summa¬ 
rized  in  Alg.  15.3.  For  calculating  the  aggregate  distance  DL(p,P) 
of  a  given  sample  vector  p  (see  Eqn.  (15.22)),  not  all  samples  in  P 
are  considered,  but  only  those  a  samples  that  are  closest  to  p  in 
the  3D  color  space  (a  being  a  fixed  fraction  of  the  support  region 
size).  The  subsequent  minimization  is  performed  over  what  is  called 
the  “trimmed  aggregate  distance”.  Thus,  only  a  fixed  number  (a)  of 
neighborhood  pixels  is  included  in  the  calculation  of  the  aggregate 
distances.  As  a  consequence,  the  sharpening  vector  median  filter 
provides  good  noise  removal  while  at  the  same  time  leaving  edge 
structures  intact. 


15.2  Nonlinear  Color 
Filters 

Fig.  15.15 

Vector  median  filter  with  vary¬ 
ing  radii  applied  to  a  real  color 
image  (L1  norm). 


383 


15  Filters  for  Color 

Images 

Alg.  15.3 

Sharpening  vector  median 
filter  for  RGB  color  images 
(extension  of  Alg.  15.2). 
The  sharpening  parameter 
s  E  [0,  1]  controls  the  number 
of  most-similar  neighborhood 
pixels  included  in  the  median 
calculation.  For  s  =  0,  all  pix¬ 
els  in  the  given  support  region 
are  included  and  no  sharpening 
occurs;  setting  s  =  1  leads 
to  maximum  sharpening.  The 
threshold  parameter  t  controls 
how  much  smaller  the  aggre¬ 
gate  distance  of  any  neigh¬ 
borhood  pixel  must  be  to  re¬ 
place  the  current  center  pixel. 


1: 

SharpeningVectorMedianFilter(  J,  r,  s,  t) 

Input:  /,  a  color  image  of  size  MxN ,  I(u,v)  E  R3;  r,  filter 
radius  (r  >  1);  s,  sharpening  parameter  (0  <  s  <  1);  t,  threshold 
(t  >  0).  Returns  a  new  (filtered)  color  image  of  size  MxN. 

2 

(M,  N )  <-  Size(J) 

3 

I'  <—  Duplicate(J) 

4 

for  all  image  coordinates  (u1v)^MxN  do 

5 

P  <—  GetSupportRegion (I,u,v,r)  >  see  Alg.  15.2 

6 

n  <—  P 

>  size  of  P 

7 

a  round  (n  —  s 

•  (n  —  2))  >  a  —  2, . . . ,  n 

8 

dctT  TrimmedAggregateDistanc e(I(u,v),  P,a) 

9 

dmin  00 

10 

for  all  p  E  P  do 

11 

d  TrimmedAggregateDistance(p,  P,  a) 

12 

if  d  <  dmin  then 

13 

Pmin  <-  P 

14 

dmin  ^  d 

15 

if  (dctr  —  dmin)  >  t  •  a  then 

16 

I' (u ,  v)  pmin  >  replace  the  center  pixel 

17 

else 

18 

I'(u,v)  I(u,v )  t>  keep  the  original  center  pixel 

19 

return  I' 

20 

TrimmedAggregateDistance(p,  P,  a) 

Returns  the  aggregate  distance  from  p  to  the  a  most  similar  ele- 

ments  in  P  =  (pi,p2 

, . . . ,  pn ) . 

21 

n  P 

>  size  of  P 

22 

Create  map  D  :  [1,  n 

bA  R 

23 

for  i  <—  1  ,...,n  do 

24 

D(i)  i —  \\p  —  P(i)||L  >  choose  any  distance  norm  L 

25 

D'  <-  Sort(D) 

>  D’(  1)  <  D'( 2)  <  ...  <  D'{n) 

26 

d  0 

27 

for  i  4 —  2, . . . ,  n  do 

>  D' ( 1)  =  0,  thus  skipped 

28 

d  d  +  D'  (i) 

29 

return  d 
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Typically,  the  aggregate  distance  of  p  to  the  a  closest  neighbor¬ 
hood  samples  is  found  by  first  calculating  the  distances  between  p 
and  all  other  samples  in  P,  then  sorting  the  result,  and  finally  adding 
up  only  the  a  initial  elements  of  the  sorted  distances  (see  procedure 
TrimmedAggregateDistance(p,  P,  a)  in  Alg.  15.3).  Thus  the  sharp¬ 
ening  median  filter  requires  an  additional  sorting  step  over  n  oc  r2 
elements  at  each  pixel,  which  again  adds  to  its  time  complexity. 


The  parameter  s  in  Alg.  15.3  specifies  the  fraction  of  region  pix¬ 
els  included  in  the  calculation  of  the  median  and  thus  controls  the 
amount  of  sharpening.  The  number  of  incorporated  pixels  a  is  de¬ 
termined  as  a  =  round(n  —  s  •  (n  —  2))  (see  Alg.  15.3,  line  7),  so  that 
a  =  n, . . . ,  2  for  s  E  [0,1].  With  s  =  0,  all  a  =  |P|  =  n  pixels  in 
the  filter  region  are  included  in  the  median  calculation  and  the  filter 
behaves  like  the  ordinary  vector-median  filter  described  in  Alg.  15.2. 
At  maximum  sharpening  (i.e.,  with  s  =  1)  the  calculation  of  the  ag¬ 
gregate  distance  includes  only  the  single  most  similar  color  pixel  in 
the  neighborhood  P. 


The  calculation  of  the  “trimmed  aggregate  distance”  is  shown 
in  Alg.  15.3  (lines  20-29).  The  function  TrimmedAggregateDistance 
(p,  P,  a)  calculates  the  aggregate  distance  for  a  given  vector  (color 
sample)  p  over  the  a  closest  samples  in  the  support  region  P.  Initially 
(in  line  24),  the  n  distances  D(i)  between  p  and  all  elements  in  P 
are  calculated,  with  D(i)  =  \\p  —  P(i)||L  (see  Eqns.  (15.23)-(15.25)). 
These  are  subsequently  sorted  by  increasing  value  (line  25)  and  the 
sum  of  the  a  smallest  values  ZT(1), . . . ,  D\a)  (line  28)  is  returned.6 

The  effects  of  varying  the  sharpen  parameter  s  are  shown  in  Fig. 
15.16,  with  a  fixed  filter  radius  r  =  2.0  and  threshold  t  =  0.  For 
s  =  0.0  (Fig.  15.16(a)),  the  result  is  the  same  as  that  of  the  ordinary 
vector  median  filter  (see  Fig.  15.15(b)). 

The  value  of  the  current  center  pixel  is  only  replaced  by  a  neigh¬ 
boring  pixel  value  if  the  corresponding  minimal  (trimmed)  aggregate 
distance  dmin  is  significantly  smaller  than  the  center  pixel’s  aggregate 
distance  dctT.  In  Alg.  15.3,  this  is  controlled  by  the  threshold  t.  The 
center  pixel  is  replaced  only  if  the  condition 


15.3  Java 
Implementation 


d  •  ) 

'Minn  ) 


>  t  •  a 


(15.27) 


holds;  otherwise  it  remains  unmodified.  Note  that  the  distance  limit 
is  proportional  to  a  and  thus  t  really  specifies  the  minimum  “aver¬ 
age”  pixel  distance;  it  is  independent  of  the  filter  radius  r  and  the 
sharpening  parameter  s. 

Results  for  typical  values  of  t  (in  the  range  0, . . . ,  10)  are  shown 
in  Figs.  15.17-15.18.  To  illustrate  the  effect,  the  images  in  Fig.  15.18 
only  display  those  pixels  that  were  not  replaced  by  the  filter,  while 
all  modified  pixels  are  set  to  black.  As  one  would  expect,  increasing 
the  threshold  t  leads  to  fewer  pixels  being  modified.  Of  course,  the 
same  thresholding  scheme  may  also  be  used  with  the  ordinary  vector 
median  filter  (see  Exercise  15.2). 


15.3  Java  Implementation 

Implementations  of  the  scalar  and  vector  median  filter  as  well  as  the 
sharpening  vector  median  filter  are  available  with  full  Java  source 
code  at  the  book’s  website.7  The  corresponding  classes 

•  ScalarMedianFilter, 

•  VectorMedianFilter,  and 

•  VectorMedianFilterSharpen 

are  based  on  the  common  super-class  GenericFilter,  which  provides 
the  abstract  methods 

void  applyTo  (ImageProcessor  ip), 
which  greatly  simplifies  the  use  of  these  filters.  The  code  segment 
in  Prog.  15.1  demonstrates  the  use  of  the  class  VectorMedianFilter 
(with  radius  3.0  and  Lx-norm)  for  RGB  color  images  in  an  ImageJ 
plugin.  For  the  specific  filters  described  in  this  chapter,  the  following 
constructors  are  provided: 

6  D'(  1)  is  zero  because  it  is  the  distance  between  p  and  itself. 

1-7 

Package  imagingbook . pub . color .filters. 
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Fig.  15.16 

Sharpening  vector  median  fil¬ 
ter  with  different  sharpness 
values  s.  The  filter  radius  is 
r  =  2.0  and  the  corresponding 
filter  mask  contains  n  =  21 
pixels.  At  each  pixel,  only  the 
a  =  21,  17,  12,  6  closest  color 
samples  (for  sharpness  s  = 
0.0,  0.2,  0.5,  0.8,  respectively) 
are  considered  when  calculat¬ 
ing  the  local  vector  median. 


ScalarMedianFilter  (Parameters  params) 

Creates  a  scalar  median  filter,  as  described  in  Sec.  15.2.1,  with 
parameter  radius  =  3.0  (default). 

VectorMedianFilter (Parameters  params) 

Creates  a  vector  median  filter,  as  described  in  Sec.  15.2.2, 
with  parameters  radius  =  3.0  (default),  distanceNorm  = 
NormType.Ll  (default),  L2,  Lmax. 

VectorMedianFilterSharpen  (Parameters  params) 

Creates  a  sharpening  vector  median  filter  (see  Sec.  15.2.3) 
with  parameters  radius  =  3.0  (default),  distanceNorm  = 
NormType.Ll  (default),  L2,  Lmax,  sharpening  factor  sharpen 
=  0.5  (default),  threshold  =  0.0  (default). 

The  listed  default  values  pertain  to  the  parameterless  constructors 
that  are  also  available.  See  the  online  API  documentation  or  the 
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Fig.  15.17 

Sharpening  vector  median 
filter  with  different  threshold 
values  t  =  0,2,  5,  10.  The 
filter  radius  and  sharpening 
factor  are  fixed  at  r  =  2.0  and 
s  =  0.0,  respectively. 


source  code  for  additional  details.  Note  that  the  created  filter  objects 
are  generic  and  can  be  applied  to  both  grayscale  and  color  images 
without  any  modification. 


15.4  Further  Reading 

A  good  overview  of  different  linear  and  nonlinear  filtering  techniques 
for  color  images  can  be  found  in  [141].  In  [186,  Ch.  2],  the  authors 
give  a  concise  treatment  of  color  image  filtering,  including  statistical 
noise  models,  vector  ordering  schemes,  and  different  color  similarity 
measures.  Several  variants  of  weighted  median  filters  for  color  images 
and  multi-channel  data  in  general  are  described  in  [6,  Ch.  2,  Sec.  2.4]. 
A  very  readable  and  up-to-date  survey  of  important  color  issues  in 
computer  vision,  such  as  color  constancy,  photometric  invariance,  and 
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Fig.  15.18 

Sharpening  vector  median  fil¬ 
ter  with  different  threshold 
values  t  =  0,2,  5,  10  (also 
see  Fig.  15.17).  Only  the 
unmodified  pixels  are  shown 
in  color,  while  all  modified 
pixels  are  set  to  black.  The 
filter  radius  and  sharpening 
factor  are  fixed  at  r  =  2.0 
and  s  =  0.0,  respectively. 


color  feature  extraction,  can  be  found  in  [83].  A  vector  median  filter 
operating  in  HS V  color  space  is  proposed  in  [240] .  In  addition  to  the 
techniques  discussed  in  this  chapter,  most  of  the  filters  described  in 
Chapter  IT  can  either  be  applied  directly  to  color  images  or  easily 
modified  for  this  purpose. 


15.5  Exercises 

Exercise  15.1.  Verify  Eqn.  (15.20)  by  showing  (formally  or  experi¬ 
mentally)  that  the  usual  calculation  of  the  scalar  median  (  by  sorting 
a  sequence  and  selecting  the  center  value)  indeed  gives  the  value  with 
the  smallest  sum  of  differences  from  all  other  values  in  the  same  se¬ 
quence.  Is  the  result  independent  of  the  type  of  distance  norm  used? 
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1  import  i j . ImagePlus ; 

2  import  ij .plugin. filter . PluglnFilter ; 

3  import  ij .process . ImageProcessor ; 

4  import  imagingbook . lib .math . VectorNorm . NormType ; 

5  import  imagingbook . lib . util . Enums ; 

6  import  imagingbook .pub . colorfilters .Vect orMedianFilter ; 

7  import  imagingbook . pub . colorfilters .Vect orMedianFilter . 

8 

9  public  class  MedianFilter_Color_Vector  implements 
PluglnFilter 

10  { 

11  public  int  setup (String  arg,  ImagePlus  imp)  { 

12  return  D0ES_RGB ; 

13  } 

14 

15  public  void  run (ImageProcessor  ip)  { 

16  Parameters  params  = 

17  new  Vect orMedianFilter . Parameters () ; 

18  params . distanceNorm  =  NormType. LI; 

19  params . radius  =  3.0; 

20 

21  VectorMedianFilter  filter  = 

22  new  VectorMedianFilter (params) ; 

23 

24  filter . applyTo (ip) ; 

25  } 

26  } 


Prog.  15.1 

Color  median  filter  using  class 
VectorMedianFilter.  In  line  17, 
a  suitable  parameter  object 
(with  default  values)  is  cre¬ 
ated,  then  modified  and  passed 
to  the  constructor  of  the  filter 
(in  line  22).  The  filter  itself 
is  applied  to  the  input  image, 
which  is  destructively  modified 
(in  line  24). 


Exercise  15.2.  Modify  the  ordinary  vector  median  filter  described  in 
Alg.  15.2  to  incorporate  a  threshold  t  for  deciding  whether  to  modify 
the  current  center  pixel  or  not,  analogous  to  the  approach  taken  in 
the  sharpening  vector  median  filter  in  Alg.  15.3. 

Exercise  15.3.  Implement  a  dedicated  median  filter  (analogous  to 
Alg.  15.1)  for  the  HSV  color  space.  The  filter  should  process  the 
color  components  independently  but  consider  the  circular  nature  of 
the  hue  component,  as  discussed  in  Sec.  15.1.3.  Compare  the  results 
to  the  vector-median  filter  in  Sec.  15.2.2. 
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16 


Edge  Detection  in  Color  Images 


Edge  information  is  essential  in  many  image  analysis  and  computer 
vision  applications  and  thus  the  ability  to  locate  and  characterize 
edges  robustly  and  accurately  is  an  important  task.  Basic  techniques 
for  edge  detection  in  grayscale  images  are  discussed  in  Chapter  6. 
Color  images  contain  richer  information  than  grayscale  images  and 
it  appears  natural  to  assume  that  edge  detection  methods  based  on 
color  should  outperform  their  monochromatic  counterparts.  For  ex¬ 
ample,  locating  an  edge  between  two  image  regions  of  different  hue 
but  similar  brightness  is  difficult  with  an  edge  detector  that  only 
looks  for  changes  in  image  intensity.  In  this  chapter,  we  first  look  at 
the  use  of  “ordinary”  (i.e.,  monochromatic)  edge  detectors  for  color 
images  and  then  discuss  dedicated  detectors  that  are  specifically  de¬ 
signed  for  color  images. 

Although  the  problem  of  color  edge  detection  has  been  pursued  for 
a  long  time  (see  [140,266]  for  a  good  overview),  most  image  processing 
texts  do  not  treat  this  subject  in  much  detail.  One  reason  could  be 
that,  in  practice,  edge  detection  in  color  images  is  often  accomplished 
by  using  “monochromatic”  techniques  on  the  intensity  channel  or 
the  individual  color  components.  We  discuss  these  simple  methods- 
which  nevertheless  give  satisfactory  results  in  many  situations — in 
Sec.  16.1. 

Unfortunately,  monochromatic  techniques  do  not  extend  natu¬ 
rally  to  color  images  and  other  “multi-channel”  data,  since  edge  in¬ 
formation  in  the  different  color  channels  may  be  ambiguous  or  even 
contradictory.  For  example,  multiple  edges  running  in  different  di¬ 
rections  may  coincide  at  a  given  image  location,  edge  gradients  may 
cancel  out,  or  edges  in  different  channels  may  be  slightly  displaced. 
In  Sec.  16.2,  we  describe  how  local  gradients  can  be  calculated  for 
edge  detection  by  treating  the  color  image  as  a  2D  vector  field.  In 
Sec.  16.3,  we  show  how  the  popular  Canny  edge  detector,  originally 
designed  for  monochromatic  images,  can  be  adapted  for  color  images, 
and  Sec.  16.4  goes  on  to  look  at  other  color  edge  operators.  Imple¬ 
mentations  of  the  discussed  algorithms  are  described  in  Sec.  16.5, 
with  complete  source  code  available  on  the  book’s  website. 
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16  Edge  Detection  in 
Color  Images 


16.1  Monochromatic  Techniques 


Linear  filters  are  the  basis  of  most  edge  enhancement  and  edge  de¬ 
tection  operators  for  scalar- valued  grayscale  images,  particularly  the 
gradient  filters  described  in  Chapter  15,  Sec.  6.3.  Again,  it  is  quite 
common  to  apply  these  scalar  filters  separately  to  the  individual  color 
channels  of  RGB  images.  A  popular  example  is  the  Sobel  operator 
with  the  filter  kernels 


-1  0  1 

-2  0  2 


and 


1 

8  * 


-1  -2-1 

0  0  0 

1  2  1 


(16.1) 


for  the  x-  and  ^/-direction,  respectively.  Applied  to  a  grayscale  image 
/,  with  Ix  =  I  *  Hx  and  Iy  =  I  *  Hy,  these  Liters  give  a  reasonably 
good  estimate  of  the  local  gradient  vector, 


u)\ 
u)J 

at  position  u  =  (iq  v).  The  local  edge  strength  of  the  grayscale  image 
is  then  taken  as 


(16.2) 


-^gray  (^0  II  ^ ^ (^0 


and  the  corresponding  edge  orientation  is  calculated  as 


<P(u)  =  ZVZ(it)  =  tan  1 


4(«)\ 

40)/ 


(16.3) 


(16.4) 


The  angle  @(u)  gives  the  direction  of  maximum  intensity  change  on 
the  2D  image  surface  at  position  (u),  which  is  the  normal  to  the  edge 
tangent. 

Analogously,  to  apply  this  technique  to  a  color  image  I  =  (/R, 
/G,l b)>  each  color  plane  is  first  filtered  individually  with  the  two 
gradient  kernels  given  in  Eqn.  (16.1),  resulting  in 


V/R  = 

v/G  = 

V/B  = 


JR,y) 

G .  x  j 

IG,y) 

I&,x  \ 

G, y) 


(Ik  *  H%\ 

W  *  Hi)  ’ 

(Ig  *  Hl\ 

U  *  Hi)  ■ 

(Ik  *  Hf\ 
Vb  *  Hi)  ■ 


(16.5) 


The  local  edge  strength  is  calculated  separately  for  each  color  channel 
which  yields  a  vector 


(16.6) 


(16.7) 
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for  each  image  position  u.  These  vectors  could  be  combined  into 
a  new  color  image  E  =  (ER,EG,EB),  although  such  a  “color  edge 
image”  has  no  particularly  useful  interpretation.1  Finally,  a  scalar 
quantity  of  combined  edge  strength  (C)  over  all  color  planes  can  be 
obtained,  for  example,  by  calculating  the  Euclidean  (L2)  norm  of 
E  as 


C2(u)  =  ||£(u) 


Er(u)  +  +  ^b(u) 


1/2 


Ir,x  +  Jr,v  +  da  *  +  In  „  +  It\  r  +  I 


G,x 


G,y  '  1B,x 


B  ,y 


1/2 


(16.8) 


(coordinates  (u)  are  omitted  in  the  second  line)  or,  using  the  Lx 
norm, 


r<  (. 


Another  alternative  for  calculating  a  combined  edge  strength  is  to 
take  the  maximum  magnitude  of  the  RGB  gradients  (i.e.,  the  L ^ 
norm) , 


Coo(u)  =  ||£?(«)||00  =  max(|ER(w)| ,  \EG(u)\ ,  |£,b(m)|)  •  (16.10) 


An  example  using  the  test  image  from  Chapter  15  is  given  in  Fig. 
16.1.  It  shows  the  edge  magnitude  of  the  corresponding  grayscale 
image  and  the  combined  color  edge  magnitude  calculated  with  the 
different  norms  defined  in  Eqns.  (16.8)-(16.10).2 

As  far  as  edge  orientation  is  concerned,  there  is  no  simple  ex¬ 
tension  of  the  grayscale  case.  While  edge  orientation  can  easily  be 
calculated  for  each  individual  color  component  (using  Eqn.  (16.4)), 
the  gradients,  three  color  channels  are  generally  different  (or  even 
contradictory)  and  there  is  no  obvious  way  of  combining  them. 

A  simple  ad  hoc  approach  is  to  choose,  at  each  image  position 
iq  the  gradient  direction  from  the  color  channel  of  maximum  edge 
strength,  that  is, 


<Z>coi(«)  =  tan  dEA-h,  (16.11) 

'  Im,x 

with  rn  =  argmax Ek(u). 
fc=R,G,B 

This  simple  (monochromatic)  method  for  calculating  edge  strength 
and  orientation  in  color  images  is  summarized  in  Alg.  16.1  (see  Sec. 
16.5  for  the  corresponding  Java  implementation).  Two  sample  results 
are  shown  in  Fig.  16.2.  For  comparison,  these  figures  also  show  the 
edge  maps  obtained  by  first  converting  the  color  image  to  a  grayscale 

1  Such  images  are  nevertheless  produced  by  the  “Find  Edges”  command  in 
ImageJ  and  the  filter  of  the  same  name  in  Photoshop  (showing  inverted 
components) . 

2  In  this  case,  the  grayscale  image  in  (c)  was  calculated  with  the  direct 
conversion  method  (see  Chapter  14,  Eqn.  (14.39))  from  nonlinear  sRGB 
components.  With  linear  grayscale  conversion  (Ch.  14,  Eqn.  (14.37)), 
the  desaturated  bar  at  the  center  would  exhibit  no  grayscale  edges  along 
its  borders,  since  the  luminance  is  the  same  inside  and  outside. 


16.1  Monochromatic 
Techniques 
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Fig.  16.1 

Color  edge  enhancement 
with  monochromatic  meth¬ 
ods.  Original  color  image  (a) 
and  corresponding  grayscale 
image  (b);  edge  magnitude 
from  the  grayscale  image  (c). 
Color  edge  magnitude  calcu¬ 
lated  with  different  norms: 
Li  (d)5  L2  (e),  and  (f). 
The  images  in  (c— f)  are  in¬ 
verted  for  better  viewing. 


image  and  then  applying  the  Sobel  operator3  (Fig.  16.2(b)).  The 
edge  magnitude  in  all  examples  is  normalized;  it  is  shown  inverted 
and  contrast-enhanced  to  increase  the  visibility  of  low-contrast  edges. 
As  expected  and  apparent  from  the  examples,  even  simple  monochro¬ 
matic  techniques  applied  to  color  images  perform  better  than  edge 
detection  on  the  corresponding  grayscale  images.  In  particular,  edges 
between  color  regions  of  similar  brightness  are  not  detectable  in  this 
way,  so  using  color  information  for  edge  detection  is  generally  more 
powerful  than  relying  on  intensity  alone.  Among  the  simple  color 
techniques,  the  maximum  channel  edge  strength  C ^  (Eqn.  (16.10)) 
seems  to  give  the  most  consistent  results  with  the  fewest  edges  getting 
lost. 

However,  none  of  the  monochromatic  detection  techniques  can 
be  expected  to  work  reliably  under  these  circumstances.  While  the 
threshold  for  binarizing  the  edge  magnitude  could  be  tuned  manu¬ 
ally  to  give  more  pleasing  results  on  specific  images,  it  is  difficult 
in  practice  to  achieve  consistently  good  results  over  a  wide  range  of 
images.  Methods  for  determining  the  optimal  edge  threshold  dynam- 

3  See  Chapter  6,  Sec.  6.3.1. 
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Input:  I  =  (/R  ,  /B),  an  RGB  color  image  of  size  M  x  N.  Re¬ 
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16.2  Edges  in 
Vector- Valued  Images 

Alg.  16.1 

Monochromatic  color  edge 
operator.  A  pair  of  Sobel- 
type  filter  kernels  (H^,  Hy) 
is  used  to  estimate  the  local 
x/y  gradients  of  each  compo¬ 
nent  of  the  RGB  input  image 
I .  Color  edge  magnitude  is 
calculated  as  the  L2  norm  of 
the  color  gradient  vector  (see 
Eqn.  (16.8)).  The  procedure 
returns  a  pair  of  maps,  hold¬ 
ing  the  edge  magnitude  E2 
and  the  edge  orientation 
respectively. 


ically,  that  is,  depending  on  the  image  content,  have  been  proposed, 
typically  based  on  the  statistical  variability  of  the  color  gradients. 
Additional  details  can  be  found  in  [84, 171, 192]. 


16.2  Edges  in  Vector-Valued  Images 

In  the  “monochromatic”  scheme  described  in  Sec.  16.1,  the  edge  mag¬ 
nitude  in  each  color  channel  is  calculated  separately  and  thus  no  use 
is  made  of  the  potential  coupling  between  color  channels.  Only  in  a 
subsequent  step  are  the  individual  edge  responses  in  the  color  chan¬ 
nels  combined,  albeit  in  an  ad  hoc  fashion.  In  other  words,  the  color 
data  are  not  treated  as  vectors,  but  merely  as  separate  and  unrelated 
scalar  values. 

To  obtain  better  insight  into  this  problem  it  is  helpful  to  treat  the 
color  image  as  a  vector  field ,  a  standard  construct  in  vector  calculus 
[32,  223]. 4  A  three-channel  RGB  color  image  I(u)  =  IG(u), 

Ib(u))  can  be  modeled  as  a  discrete  2D  vector  field,  that  is,  a  function 
whose  coordinates  u  =  (r,  v)  are  2D  and  whose  values  are  3D  vectors. 

4  See  Sec.  C.2  in  the  Appendix  for  some  general  properties  of  vector  fields. 
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Fig.  16.2 

Example  of  color  edge  en¬ 
hancement  with  monochro¬ 
matic  techniques  (balloons  im¬ 
age).  Original  color  image  and 
corresponding  grayscale  image 
(a),  edge  magnitude  obtained 
from  the  grayscale  image  (b), 
color  edge  magnitude  calcu¬ 
lated  with  the  L2  norm  (c), 
and  the  norm  (d).  Differ¬ 
ences  between  the  grayscale 
edge  detector  (b)  and  the 
color-based  detector  (c— e)  are 
particularly  visible  inside  the 
right  balloon  and  at  the  lower 
borders  of  the  tangerines. 


Original  image  I 


Gray  edge  (Fgray) 


Color  edge  (Ex) 


Color  edge  (E2) 


Color  edge  (CgJ 


Similarly,  a  grayscale  image  can  be  described  as  a  discrete  scalar  field, 
since  its  pixel  values  are  only  ID. 
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16.2.1  Multi-Dimensional  Gradients 


16.2  Edges  in 
Vector- Valued  Images 


As  noted  in  the  previous  section,  the  gradient  of  a  scalar  image  I  at 
a  specific  position  u  is  defined  as 


VI(u)  = 


(16.12) 


that  is,  the  vector  of  the  partial  derivatives  of  the  function  I  in  the 
x-  and  ^-direction,  respectively.5  Obviously,  the  gradient  of  a  scalar 
image  is  a  2D  vector  field. 

In  the  case  of  a  color  image  I  =  (/R,  /G,  /B),  we  can  treat  the  three 
color  channels  as  separate  scalar  images  and  obtain  their  gradients 
analogously  as 


V/R(«)  = 


v/B(«)= 


(«)\ 

(«)/  ’ 
(16.13) 


which  is  equivalent  to  what  we  did  in  Eqn.  (16.5).  Before  we  can  take 
the  next  steps,  we  need  to  introduce  a  standard  tool  for  the  analysis 
of  vector  fields. 


16.2.2  The  Jacobian  Matrix 


The  Jacobian  matrix6  J j{u)  combines  all  first  partial  derivatives  of  a 
vector  field  I  at  a  given  position  u,  its  row  vectors  being  the  gradients 
of  the  scalar  component  functions.  In  particular,  for  an  RGB  color 
image  /,  the  Jacobian  matrix  is  defined  as 


Av/r_)t_(«A 

AREA 

\(viBy(u)j 


(16.14) 


with  V/R,  V/G,  V/B  as  defined  in  Eqn.  (16.13).  We  see  that  the  2D 
gradient  vectors  (V/R)T,  (VIq)t,  (V/b)t  constitute  the  rows  of  the 
resulting  3x2  matrix  Jj.  The  two  3D  column  vectors  of  this  matrix, 


Ix(u)  = 


(w) 


dy 

dl 


G 


dy 

el 


(16.15) 


are  the  partial  derivatives  of  the  color  components  along  the  x-  and  y- 
axes,  respectively.  At  a  given  position  u,  the  total  amount  of  change 
over  all  three  color  channels  in  the  horizontal  direction  can  be  quanti¬ 


fied  by  the  norm  of  the  corresponding  column  vector 
ogously, 

channels  along  the  vertical  axis. 


Ix(u) || .  Anal- 


Iy(u)  ||  gives  the  total  amount  of  change  over  all  three  color 


5  Of  course,  images  are  discrete  functions  and  the  partial  derivatives  are 
estimated  from  finite  differences  (see  Sec.  C.3.1  in  the  Appendix). 

See  also  Sec.  C.2.1  in  the  Appendix. 
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16.2.3  Squared  Local  Contrast 


Now  that  we  can  quantify  the  change  along  the  horizontal  and  vertical 
axes  at  any  position  u,  the  next  task  is  to  find  out  the  direction  of 
the  maximum  change  to  find  the  angle  of  the  edge  normal,  which 
we  then  use  to  derive  the  local  edge  strength.  How  can  we  calculate 
the  gradient  in  some  direction  6  other  than  horizontal  and  vertical? 
For  this  purpose,  we  use  the  product  of  the  unit  vector  oriented  at 
angle  0, 


(  cos(6>A 

ysin(0)  J  ’ 


(16.16) 


and  the  Jacobian  matrix  J j  (Eqn.  (16.14))  in  the  form 


(grad0  -OO) 


Ix(u)  •  cos(0)  +  Iy(u)  •  sin(0). 


(16.17) 


The  resulting  3D  vector  (grad6,/)(u)  is  called  the  directional  gradi¬ 
ent7  of  the  color  image  I  in  the  direction  9  at  position  u.  By  taking 
the  squared  norm  of  this  vector, 


Se(I,u) 


I  (grade  -0(«) 


2 

2 


(16.18) 


=  ||ia,(u)-cos(0)  +  Iy{u)  -sin(0)  || 

=  ll(u)  •  cos2(0)  +  2-Ix(u)-Iy(u)-cos(6)-sm(6)  +  J2(u) -sin2(^), 


we  obtain  what  is  called  the  squared  local  contrast  of  the  vector¬ 
valued  image  I  at  position  u  in  direction  6. 8  For  an  RGB  image  I  = 
(7r,/q,/b),  the  squared  local  contrast  in  Eqn.  (16.18)  is,  explicitly 
written, 


Se(I,u)  = 


+ 


( Ir,x(U)\  l1 R,!/W\  , 

Ig,x(u)  -cos(0)  +  IG,y(u)  sin(0)  ||  2 

\Ib,x(u)J  \lB,y(u)/ 


Ir,x(u)  +  Ig,x(u)  +  Ib,x(u)\  '  cos2(6>) 
Ir,V(u)  +  ,y(u)  +  h B,y(u )]  '  sin2(0) 


(16.19) 


(16.20) 


+  2  •  cos(0)  •  sin(0)  • 

Ir,x(u)  '  lR,y(u)  +  Ig,x(u)  '  Ig,v(u)  +  Ib,x(u)  '  _  • 


Note  that,  in  the  case  that  I  is  a  scalar  image,  the  squared  local 
contrast  reduces  to 


S0(I,u) 


(grad0  I)(u) 
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(  cos(6>)\ 
\shi{0)J 
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Ix(u)  •  cos(0)  +  Iy{u)  •  sin(0) 


(16.21) 

(16.22) 


We  will  return  to  this  result  again  later  in  Sec.  16.2.6.  In  the  follow¬ 
ing,  we  use  the  root  of  the  squared  local  contrast,  that  is,  a/ Sq(I,  tx), 
under  the  term  local  contrast. 

7  See  also  Sec.  C.2.2  in  the  Appendix  (Eqn.  (C.18)). 

Note  that  Ix  =  IXIX,  Iy  =  Iy‘Iy  and  Ix-Iy  in  Eqn.  (16.18)  are  dot 
products  and  thus  the  results  are  scalar  values. 
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Fig.  16.3 

Local  image  gradients  and  lo¬ 
cal  contrast.  In  case  of  a  scalar 
(grayscale)  image  I  (a),  the  lo¬ 
cal  gradient  V/  defines  a  single 
plane  that  is  tangential  to  the 
image  function  I  at  position 
u  =  (u,  v).  In  case  of  an  RGB 
color  image  I  =  (IK,IG,IB) 
(b),  the  local  gradients  V/R, 
V/G,  V/B  for  each  color 
channel  define  three  tangent 
planes.  The  vertical  axes  in 
graphs  (c,  d)  show  the  corre¬ 
sponding  local  contrast  values 

■\J Se  (/,  u)  (see  Eqns.  (16.18) 

and  (16.19))  for  all  possible 
directions  0  =  0,...,  2tt. 


Figure  16.3  illustrates  the  meaning  of  the  squared  local  contrast 
in  relation  to  the  local  image  gradients.  At  a  given  image  position  u , 
the  local  gradient  V/(ix)  in  a  grayscale  image  (Fig.  16.3(a))  defines  a 
single  plane  that  is  tangential  to  the  image  function  I  at  position  u. 
In  case  of  a  color  image  (Fig.  16.3(b)),  each  color  channel  defines  an 
individual  tangent  plane.  In  Fig.  16.3(c,d)  the  local  contrast  values 
are  shown  as  the  height  of  cylindrical  surfaces  for  all  directions  6.  For 
a  grayscale  image  (Fig.  16.3(c)),  the  local  contrast  changes  linearly 
with  the  orientation  0,  while  the  relation  is  quadratic  for  a  color  image 
(Fig.  16.3(d)).  To  calculate  the  strength  and  orientation  of  edges  we 
need  to  determine  the  direction  of  the  maximum  local  contrast,  which 
is  described  in  the  following. 


16.2.4  Color  Edge  Magnitude 


The  directions  that  maximize  Sq(I,u)  in  Eqn.  (16.18)  can  be  found 
analytically  as  the  roots  of  the  first  partial  derivative  of  S  with  respect 
to  the  angle  0,  as  originally  suggested  by  Di  Zenzo  [63],  and  the 
resulting  quantity  is  called  maximum  local  contrast.  As  shown  in  [59], 
the  maximum  local  contrast  can  also  be  found  from  the  Jacobian 
matrix  J \  (Eqn.  (16.14))  as  the  largest  eigenvalue  of  the  (symmetric) 
2x2  matrix 


M  (u)  =  Jj(u) 


(  U(G  Ix{u)-Iy{u)\  ( A(u)  C(u)\ 

\ly(u)-Ix(u)  I2y(u )  )  \C(u)  B(u)J  ’ 


(16.23) 

(16.24) 
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with  the  elements 


I x{xl)  I •  I x(zi^ , 

B(u)  =  I2y(u)  =  Iy(u)-Iy(u),  (16.25) 

C{u)  =  Ix(u)-Iy(u)  =  Iy(u)-Ix(u). 

The  matrix  M  (u)  could  be  considered  as  the  color  equivalent  to  the 
local  structure  matrix  used  for  corner  detection  on  grayscale  images 
in  Chapter  7,  Sec.  7.2.1.  The  two  eigenvalues  Al7A2  of  M  can  be 
found  in  closed  form  as9 

^ i(u )  =  {A  +  B  +  \/{A-  B )2  +  4  •  C2)/ 2, 

_ _  (lb. 2o) 

A 2{u)  =  (A  +  B-  y/(Al-B)2+4.C2)/2. 

Since  M  is  symmetric,  the  expression  under  the  square  root  in  Eqn. 
(16.26)  is  positive  and  thus  all  eigenvalues  are  real.  In  addition,  A,  B 
are  both  positive  and  therefore  X1  is  always  the  larger  of  the  two 
eigenvalues.  It  is  equivalent  to  the  maximum  squared  local  contrast 
(Eqn.  (16.18)),  that  is, 


A-,  (u)  =  max  Sa(I,u ), 

0<6><2t t 


(16.27) 


and  thus  can  be  used  directly  to  quantify  the  local  edge  strength. 
The  eigen  vector  associated  with  Ax  (u)  is 


Q  i(«) 


(A-B+  y (A  -  B)2  +  4  •  CA 

V  2-C  )' 


(16.28) 


or,  equivalently,  any  multiple  of  q1 .10  Thus  the  rate  of  change  along 
the  vector  qx  is  the  same  as  in  the  opposite  direction  —  qq,  and  it 
follows  that  the  local  contrast  Sq(I,u)  at  orientation  0  is  the  same 
at  orientation  6  +  kn  (for  any  k  G  Z).* 11  As  usual,  the  unit  vec¬ 
tor  corresponding  to  qx  is  obtained  by  scaling  qx  by  its  magnitude, 
that  is, 


<h  = 


1 

Qi 


<h- 


(16.29) 


An  alternative  method,  proposed  in  [60],  is  to  calculate  the  unit 
eigenvector  qx  =  (xlly1)T  in  the  form 

Qi  =  (vA^,  sgn(C')  •  ^dT")  T’  (16-3°) 


with  a  =  (A  —  B) / yj (A  —  B)2  +  4C2,  directly  from  the  matrix  ele¬ 
ments  A,B,C  defined  in  Eqn.  (16.25). 

While  q1  (the  eigenvector  associated  with  the  greater  eigenvalue 
of  M)  points  in  the  direction  of  maximum  change,  the  second  eigen¬ 
vector  q2  (associated  with  A2)  is  orthogonal  to  qq,  that  is,  has  the 
same  direction  as  the  local  edge  tangent. 

9  See  Sec.  B.4  in  the  Appendix  for  details. 

10  The  eigenvalues  of  a  matrix  are  unique,  but  the  corresponding  eigenvec¬ 
tors  are  not. 

11  Thus  the  orientation  of  maximum  change  is  inherently  ambiguous  [60]. 
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16.2.5  Color  Edge  Orientation 

The  local  orientation  of  the  edge  (he.,  the  normal  to  the  edge  tangent) 
at  a  given  position  u  can  be  obtained  directly  from  the  associated 
eigenvector  qq('u)  =  (qx(u),  qy(u))T  using  the  relation 
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tan  (01(u)) 


Qx(u) 

qy{u) 


_ 2 _C _ 

A  -  B  +  U04  -  B)2  +  4  •  C2  ’ 


which  can  be  simplified12  to 


tan(2  •  &i(u)) 


2  •  C 
A-B' 


(16.31) 


(16.32) 


Unless  both  A  =  B  and  C  =  0  (in  which  case  the  edge  orientation 
is  undetermined)  the  angle  of  maximum  local  contrast  or  color  edge 
orientation  can  be  calculated  as 

1  /  2  •  C  \  1 

91(u)  =  -  •  tan-1  (  — - —  )  =  -  •  ArcTan(A  —  B,  2  •  C).  (16.33) 

2  V  A  —  B  /  2 

The  above  steps  are  summarized  in  Alg.  16.2,  which  is  a  color  edge 

operator  based  on  the  first  derivatives  of  the  image  function  (see  Sec. 
16.5  for  the  corresponding  Java  implementation).  It  is  similar  to  the 
algorithm  proposed  by  Di  Zenzo  [63]  but  uses  the  eigenvalues  of  the 
local  structure  matrix  for  calculating  edge  magnitude  and  orientation, 
as  suggested  in  [59]  (see  Eqn.  (16.24)). 

Results  of  the  monochromatic  edge  operator  in  Alg.  16.1  and  the 
Di  Zenzo-Cumani  multi-gradient  operator  in  Alg.  16.2  are  compared 
in  Fig.  16.4.  The  synthetic  test  image  in  Fig.  16.4(a)  has  constant 
luminance  (brightness)  and  thus  no  gray-value  operator  should  be 
able  to  detect  edges  in  this  image.  The  local  edge  strength  E(u) 
produced  by  the  two  operators  is  very  similar  (Fig.  16.4(b)).  The 
vectors  in  Fig.  16.4(c-f)  show  the  orientation  of  the  edge  tangents 
that  are  normals  to  the  direction  of  maximum  color  contrast,  <P(u). 
The  length  of  each  tangent  vector  is  proportional  to  the  local  edge 
strength  E(u). 

Figure  16.5  shows  two  examples  of  applying  the  Di  Zenzo-Cumani- 
style  color  edge  operator  (Alg.  16.2)  to  real  images.  Note  that  the 
multi-gradient  edge  magnitude  (calculated  from  the  eigenvalue  X1  in 
Eqn.  (16.27))  in  Fig.  16.5(b)  is  virtually  identical  to  the  monochro¬ 
matic  edge  magnitude  Emag  under  the  L2  norm  in  Fig.  16.2(d).  The 
larger  difference  to  the  result  for  the  norm  in  Fig.  16.2(e)  is  shown 
in  Fig.  16.5(c). 

Thus,  considering  only  edge  magnitude ,  the  Di  Zenzo-Cumani  op¬ 
erator  has  no  significant  advantage  over  the  simpler,  monochromatic 
operator  in  Sec.  16.1.  However,  if  edge  orientation  is  important  (as 
in  the  color  version  of  the  Canny  operator  described  in  Sec.  16.3),  the 
Di  Zenzo-Cumani  technique  is  certainly  more  reliable  and  consistent. 


16.2.6  Grayscale  Gradients  Revisited 

As  one  might  have  guessed,  the  usual  gradient-based  calculation  of 
the  edge  orientation  (see  Ch.  6,  Sec.  6.2)  is  only  a  special  case  of  the 


12  Using  the  relation  tan(20)  =  [2  •  tan(0)]  /  [1  —  tan2(#)]. 
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Alg.  16.2 

Di  Zenzo/Cumani-style  multi¬ 
gradient  color  edge  operator. 
A  pair  of  Sobel-type  filters 
(fX®,  Hy )  is  used  for  esti¬ 
mating  the  local  x/y  gradi¬ 
ents  in  each  component  of 
the  RGB  input  image  I .  The 
procedure  returns  a  pair  of 
maps,  holding  the  edge  mag¬ 
nitude  E(u )  and  the  edge 
orientation  <P(u),  respectively. 
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1:  MultiGradientColorEdge(J) 

Input:  I  =  (/r,/g,/b)>  an  RGB  color  image  of  size  MxN. 
Returns  a  pair  of  maps  (E,  <fi)  for  edge  magnitude  and  orientation. 


2: 


o  1  ' 
0  2 

0  1 


Hs  ■=  -■ 
±±y  •  8 


-1  -2  -1 

0  0  0 

12  1 


>  x/y  gradient  kernels 


4 

5 


(M,  N)  <-  Size(J) 

Create  maps  E,  <P  :  M  x  N  ka  M 


t>  edge  magnitude/orientation 


6 

7r 

x  ^  IRy  <—  lR*E[y 

>  apply 

gradient  filters 

7 

7g 

x<—Ig*Hxi  Ig,v 

8 

^B, 

X  lB  y  IB*Hy 

9 

for  all  u  E  MxN  do 

10 

(G  ?  See  ?  ^  (7r?x  (^0  5  ^G,x  (^0  5  7r 

,x  (^)) 

11 

(C?§W^y)  ^  (7r,^  (^)  5  ^G,y  (^0  ?  7r 

!/(«)) 

12 

^ 

>  A  =  Ix  Ix 

13 

B  <—  r2  A  gy  + 

>  B  =  Iy-Iy 

14 

G  f  ^x'^y  A  '  §t/  +  ^ x  ‘  ^ y 

>  c  =  R  R 

15 

Ax  v-  (A.-I--B  +  ( A  —  B )2  +  4  •  C2  )  /  2 

>  Eq.  16.26 

16 

E{u )  < —  \ZAx 

>  Eq.  16.27 

17 

^>(ix)  V-  |  •  ArcTan(A  —  B,  2  •  G) 

>  Eq.  16.33 

18 

return 

multi-dimensional  gradient  calculation  described  already.  Given  a 
scalar  image  /,  the  intensity  gradient  vector  (VI)(u)  =  (Ix(u),  Iy(u))T 
defines  a  single  plane  that  is  tangential  to  the  image  function  at  po¬ 
sition  ix,  as  illustrated  in  Fig.  16.3(a).  With 

A  =  ll(u ),  B  =  Iy(u),  C  =  Ix{u)-Iy{u)  (16.34) 

(analogous  to  Eqn.  (16.25))  the  squared  local  contrast  at  position  u 
in  direction  0  (as  defined  in  Eqn.  (16.18))  is 

Sq{I,u)  =  (lx(u)  -cos(0)  +  Iy(u) -sin(0))2.  (16.35) 

From  Eqn.  (16.26),  the  eigenvalues  of  the  local  structure  matrix  M  = 
(cb)  Poshion  u  are  (see  Eqn.  (16.26)) 

A1j2 (tx)  =  (A  A  B  =b  ^/(A-B)2  +4C2)  /  2,  (16.36) 

but  here,  with  IX1  Iy  not  being  vectors  but  scalar  values,  we  get  C 2  = 

(. Ix-Iy )2  =  Ix'Iyi  such  that  (A  —  B)2-\-AC2  =  ( AaB )2,  and  therefore 

A1j2(ix)  =  (A  A  B  =b  (A  A  B))  /  2  .  (16.37) 


We  see  that,  for  a  scalar- valued  image,  the  dominant  eigenvalue, 


Af(u)  —  AaB  —  Ix(u)  A  Iy{u) 


II  VI(u) 


2 

2  ’ 


(16.38) 


is  simply  the  squared  L2  norm  of  the  local  gradient  vector,  while  the 
smaller  eigenvalue  A2  is  always  zero.  Thus,  for  a  grayscale  image,  the 


Original  image 


Color  edge  strength 


Monochromatic  operator 


Di  Zenzo-Cumani  operator 


<■  *  *  *  -  *  <>  i  ♦  t  t 

*  t  *  - 

t  t  t  *  ♦  ♦  f  t  f 

*  ♦  ♦  *  f  I  f  t 

*  *  *  -  - - -  •  t  f  i  i  i 

k  '  '  - - - - -  /  /  t  t 

»  >■  —  - - - - -v  --  //ft 

'  *  — -^-^-vXW  /  /  i 

-  X  " 

-  *  ^  ^  ^  - V  X  x.  X,  x  X  X 


- - XX- 


(c) 


(d) 
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Fig.  16.4 

Results  from  the  monochro¬ 
matic  (Alg.  16.1)  and  the 
Di  Zenzo-Cumani  color  edge 
operators  (Alg.  16.2).  The 
original  color  image  (a)  has 
constant  luminance ,  that  is, 
the  intensity  gradient  is  zero 
and  thus  a  simple  grayscale 
operator  would  not  detect 
any  edges  at  all.  The  local 
edge  strength  E{u )  is  almost 
identical  for  both  color  edge 
operators  (b).  Edge  tangent 
orientation  vectors  (normal  to 
for  the  monochromatic 
and  multi-gradient  operators 
(c,d);  enlarged  details  in  (e,f). 


maximum  edge  strength  yj A1(tx)  =  ||V/(tx)||2  is  equivalent  to  the 
magnitude  of  the  local  intensity  gradient.13  The  fact  that  A2  =  0 
indicates  that  the  local  contrast  in  the  orthogonal  direction  (i.e., 
along  the  edge  tangent)  is  zero  (see  Fig.  16.3(c)). 

To  calculate  the  local  edge  orientation ,  at  position  u  we  use  Eqn. 
(16.31)  to  get 


13 


See  Eqns.  (6.5)  and  (6.13)  in  Chapter  6,  Sec.  6.2. 
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Fig.  16.5 

Results  of  Di  Zenzo-Cumani 
color  edge  operator  (Alg. 
16.2)  on  real  images.  Orig¬ 
inal  image  (a)  and  inverted 
color  edge  magnitude  (b). 

The  images  in  (c)  show 
the  differences  to  the  edge 
magnitude  returned  by  the 
monochromatic  operator  (Alg. 
16.1,  using  the  norm). 


tan(01(«))  = 


2  C 


2  C  Ix(u)-Iy(u)  Iv(u ) 


y 


A-  B  +  (A  +  B)  2  A 


1 X  (^0  1 X  (^0 


(16.39) 


and  the  direction  of  maximum  contrast14  is  then  found  as 


01(u)  =  tan 


Ix(u) 


ArcTan  (Ix(u),Iy(u)). 


(16.40) 


Thus,  for  scalar-valued  images,  the  general  (multi-dimensional)  tech¬ 
nique  based  on  the  eigenvalues  of  the  structure  matrix  leads  to  ex¬ 
actly  the  same  result  as  the  conventional  grayscale  edge  detection 
approach  described  in  Chapter  6,  Sec.  6.3. 


16.3  Canny  Edge  Detector  for  Color  Images 

Like  most  other  edge  operators,  the  Canny  detector  was  originally 
designed  for  grayscale  (i.e.,  scalar- valued)  images.  To  use  it  on  color 
images,  a  trivial  approach  is  to  apply  the  monochromatic  operator 
separately  to  each  of  the  color  channels  and  subsequently  merge  the 
results  into  a  single  edge  map.  However,  since  edges  within  the  dif¬ 
ferent  color  channels  rarely  occur  in  the  same  places,  the  result  will 
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14 


See  Eqn.  (6.14)  in  Chapter  6. 


usually  contain  multiple  edge  marks  and  undesirable  clutter  (see  Fig. 
16.8  for  an  example). 

Fortunately,  the  original  grayscale  version  of  the  Canny  edge  de¬ 
tector  can  be  easily  adapted  to  color  imagery  using  the  multi-gradient 
concept  described  in  Sec.  16.2.1.  The  only  changes  required  in  Alg. 
6.1  are  the  calculation  of  the  local  gradients  and  the  edge  magnitude 
Fmag.  The  modified  procedure  is  shown  in  Alg.  16.3  (see  Sec.  16.5 
for  the  corresponding  Java  implementation). 
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ColorCannyEdgeDetector(J,  cr,  thi,  tlo) 

Input:  I  =  (/r,/g,/b)>  an  RGB  color  image  of  size  M  x  iV; 
cr,  radius  of  Gaussian  filter  ,<T;  thi,  tlo,  hysteresis  thresholds 
(thi  >  tio)-  Returns  a  binary  edge  image  of  size  MxN. 

—  /'"'i 

Jr  4—  Jr  *  H  ,cr  >  blur  components  with  Gaussian  of  width  a 
In  <-  In  *  HG'a 


_G  *G 
Jr  4 —  I-q  *  H 


G,er 


B 

JJJ 

H y 


4r~ 

4~ 


0.5  0  0.5 
0.5  0  0.5 


>  x  gradient  filter 
D>  y  gradient  filter 


Jr,x  J 


R 


*  H 


In  rr.  4 —  In  *  H 


_G,x 

Ib,x  I 


G 

B 


*  H 


X  5 

V 
x  1 

V 
x  ? 


Jr, y  J 

Jq  ,  y  <-  j 


I 


B  ,y 


G  *H? 
V 
y 


4 —  I] 3  H, 


(M,  N)  4-  Size(J) 

Create  maps: 

Fmag,  Fnms,  Fx,  Ey  :  M  x  N  — »•  R 
Fbin  :  MxN  —>  {0, 1} 

for  all  image  coordinates  u  £  M  x  N  do 

(j X  5  Sx  5  ^x)  ^  (  Jr,X  (^0  5  Ig,X  (^0  5  Jb,CE  (^0  ) 

(G  5  Sy  5  5-y  )  4  (Jr,?/  (^)  ?  Ig,d  (^0  ?  I~B,y  (^0  ) 

A  r2  +  g2  +  b2 , 

B  4—  r2  +  g2  +  b2 

C  ^  G  ‘  G  T  See  ’  Sy  T  ‘  5^ 

1/2 


D  <-  [(A-B)2  +  4C2] 


E. 


mag(^)  ^ 

Ex(u)  ^  A  —  B  D 
Ey  \u)  4-  2C 

-Fnms(/W')  4  0 

Fbin(lt)  0 


0.5  •  (A  +  £  +  D) 


1/2 


>  VAT,  Fq.  16.27 
>  qq,  Eq.  16.28 


for  u  4 —  1, . . . ,  M  —  2  do 
for  a  <0-  1, . . . ,  N— 2  do 

u  4—  (r,  a) 
dx  4—  Ex(u) 
dy  4—  Ey(u) 

s  4—  GetOrientationSector(dx,  dy)  >  Alg.  6.2 

if  lsLocalMax(Fmag,  it,  s,  tlo)  then  >  Alg.  6.2 

Enms(u)  4  Emag(u) 

for  u  4 —  1, . . . ,  M  —  2  do 
for  a  4—  1, . . . ,  N— 2  do 

u  4—  (r,  a) 

if  (-EnmsH  >  thi  A  £bin(u)  =  0)  then 

TraceAndThreshold(Fnms,  Fbin,  R,  a,  tlo)  >  Alg.  6.2 
return  Fbin- 


16.3  Canny  Edge 
Detector  for  Color 
Images 


Alg.  16.3 

Canny  edge  detector  for  color 
images.  Structure  and  pa¬ 
rameters  are  identical  to  the 
grayscale  version  in  Alg.  6.1 
(p.  135).  In  the  algorithm  be¬ 
low,  edge  magnitude  (Emag) 
and  orientation  (_EX,  Ey)  are 
obtained  from  the  gradients  of 
the  individual  color  channels 
(as  described  in  Sec.  16.2.1). 
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In  the  pre-processing  step,  each  of  the  three  color  channels  is 
individually  smoothed  by  a  Gaussian  filter  of  width  cr,  before  cal¬ 
culating  the  gradient  vectors  (Alg.  16.3,  lines  2-9).  As  in  Alg.  16.2, 
the  color  edge  magnitude  is  calculated  as  the  squared  local  contrast, 
obtained  from  the  dominant  eigenvalue  of  the  structure  matrix  M 
(Eqns.  (16.24)-(16.27)).  The  local  gradient  vector  (. Ex,Ey )  is  cal¬ 
culated  from  the  elements  A ,  B ,  C,  of  the  matrix  M,  as  given  in 
Eqn.  (16.28).  The  corresponding  steps  are  found  in  Alg.  16.3,  lines 
14-22.  The  remaining  steps,  including  non-maximum  suppression, 
edge  tracing  and  thresholding,  are  exactly  the  same  as  in  Alg.  6.1. 

Results  from  the  grayscale  and  color  version  of  the  Canny  edge 
detector  are  compared  in  Figs.  16.6  and  16.7  for  varying  values  of 
cr  and  thi,  respectively.  In  all  cases,  the  gradient  magnitude  was 
normalized  and  the  threshold  values  thi,tlo  are  given  as  a  percent¬ 
age  of  the  maximum  edge  magnitude.  Evidently,  the  color  detector 
gives  the  more  consistent  results,  particularly  at  color  edges  with  low 
intensity  difference. 

For  comparison,  Fig.  16.8  shows  the  results  of  applying  the 
monochromatic  Canny  operator  separately  to  each  color  channel  and 
subsequently  merging  the  edge  pixels  into  a  combined  edge  map,  as 
mentioned  at  the  beginning  of  this  section.  We  see  that  this  leads 
to  multiple  responses  and  cluttered  edges,  since  maximum  gradient 
positions  in  the  different  color  channels  are  generally  not  collocated. 

In  summary,  the  Canny  edge  detector  is  superior  to  simpler 
schemes  based  on  first-order  gradients  and  global  thresholding,  in 
terms  of  extracting  clean  and  well-located  edges  that  are  immedi¬ 
ately  useful  for  subsequent  processing.  The  results  in  Figs.  16.6  and 
16.7  demonstrate  that  the  use  of  color  gives  additional  improvements 
over  the  grayscale  approach,  since  edges  with  insufficient  brightness 
gradients  can  still  be  detected  from  local  color  differences.  Essential 
for  the  good  performance  of  the  color  Canny  edge  detector,  how¬ 
ever,  is  the  reliable  calculation  of  the  gradient  direction,  based  on 
the  multi-dimensional  local  contrast  formulation  given  in  Sec.  16.2.3. 
Quite  a  few  variations  of  Canny  detectors  for  color  images  have  been 
proposed  in  the  literature,  including  the  one  attributed  to  Kanade 
(in  [140]),  which  is  similar  to  the  algorithm  described  here. 


16.4  Other  Color  Edge  Operators 

The  idea  of  using  a  vector  field  model  in  the  context  of  color  edge 
detection  was  first  presented  by  Di  Zenzo  [63] ,  who  suggested  finding 
the  orientation  of  maximum  change  by  maximizing  S(u,6)  in  Eqn. 
(16.18)  over  the  angle  6.  Later  Cumani  [59,60]  proposed  directly 
using  the  eigenvalues  and  eigenvectors  of  the  local  structure  matrix 
M  (Eqn.  (16.24))  for  calculating  edge  strength  and  orientation.  He 
also  proposed  using  the  zero-crossings  of  the  second-order  gradients 
along  the  direction  of  maximum  contrast  to  precisely  locate  edges, 
which  is  a  general  problem  with  first-order  techniques.  Both  Di  Zenzo 
and  Cumani  used  only  the  dominant  eigenvalue,  indicating  the  edge 
strength  perpendicular  to  the  edge  (if  an  edge  existed  at  all),  and  then 
discarded  the  smaller  eigenvalue  proportional  to  the  edge  strength  in 


Canny  (grayscale) 


Canny  (color) 


(a) 


(b) 


(c) 


cr  =  0.5 


(d) 


16.4  Other  Color  Edge 
Operators 


Fig.  16.6 

Canny  grayscale  vs.  color 
version.  Results  from  the 
grayscale  (left)  and  the  color 
version  (right)  of  the  Canny 
operator  for  different  values  of 
cr  (thi  =  20%,  tlo  =  5%  of  max. 
edge  magnitude). 


(e)  cr  =  1.0  (f) 
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Fig.  16.7 

Canny  grayscale  vs.  color 
version.  Results  from  the 
grayscale  (left)  and  the 
color  version  (right)  of  the 
Canny  operator  for  different 
threshold  values  thi,  given 
in  %  of  max.  edge  magni¬ 
tude  (tlo  =  5%,  a  =  2.0). 


Canny  (grayscale) 


Canny  (color) 


(a) 


(b) 
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cr  =  2.0 


a  —  5.0 
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Fig.  16.8 

Scalar  vs.  vector-based  color 
Canny  operator.  Results  from 
the  scalar  Canny  operator  ap¬ 
plied  separately  to  each  color 
channel  (a,  b).  Channel  edges 
are  shown  in  corresponding 
colors,  with  mixed  colors  indi¬ 
cating  that  edge  points  were 
detected  in  multiple  channels 
(e.g.,  yellow  marks  overlapping 
points  from  the  red  and  the 
green  channel).  A  black  pixel 
indicates  that  an  edge  point 
was  detected  in  all  three  color 
channels.  Channel  edges  com¬ 
bined  into  a  joint  edge  map 
(c,d).  For  comparison,  the  re¬ 
sult  of  the  vector-based  color 
Canny  operator  (e,f).  Com¬ 
mon  parameter  settings  are 
cr  =  2.0  and  5.0,  thi  =  20%, 
bo  =  5%  of  max.  edge  magni¬ 
tude. 


the  perpendicular  (i.e.,  tangential)  direction.  Real  edges  only  exist 
where  the  larger  eigenvalue  is  considerably  greater  than  the  smaller 
one.  If  both  eigenvalues  have  similar  values,  this  indicates  that  the 
local  image  surface  exhibits  change  in  all  directions,  which  is  not 
typically  true  at  an  edge  but  quite  characteristic  of  flat,  noisy  regions 
and  corners.  One  solution  therefore  is  to  use  the  difference  between 
the  eigenvalues,  X1  —  A2,  to  quantify  edge  strength  [206]. 

Several  color  versions  of  the  Canny  edge  detector  can  be  found  in 
the  literature,  such  as  the  one  proposed  by  Kanade  (in  [140])  which 
is  very  similar  to  the  algorithm  presented  here.  Other  approaches  of 
adapting  the  Canny  detector  for  color  images  can  be  found  in  [85]. 
In  addition  to  Canny’s  scheme,  other  types  of  color  edge  detectors 
have  been  used  successfully,  including  techniques  based  on  vector 
order  statistics  and  color  difference  vectors.  Excellent  surveys  of 
the  various  color  edge  detection  approaches  can  be  found  in  [266] 
and  [141,  Ch.  6]. 
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16.5  Java  Implementation 


The  following  Java  implementations  of  the  algorithms  described  in 
this  chapter  can  be  found  in  the  source  code  section15  of  the  book’s 
website.  The  common  (abstract)  super-class  for  all  color  edge  de¬ 
tectors  is  ColorEdgeDetector,  which  mainly  provides  the  following 
methods: 

FloatProcessor  getEdgeMagnitude  () 

Returns  the  resulting  edge  magnitude  map  E(u)  as  a  Float- 
Processor  object. 

FloatProcessor  getEdgeOrientation  () 

Returns  the  resulting  edge  orientation  map  $(u)  as  a  Float- 
Processor  object,  with  values  in  the  range  [— 7r,  tt  . 

The  following  edge  detectors  are  defined  as  concrete  sub-classes  of 

ColorEdgeDetector: 

GrayscaleEdgeDetector:  Implements  an  edge  detector  that  uses 
only  the  intensity  (brightness)  of  the  supplied  color  image. 
MonochromaticEdgeDetector:  Implements  the  monochromatic  color 
edge  detector  described  in  Alg.  16.1. 
DiZenzoCumaniEdgeDetector:  Implements  the  Di  Zenzo-Cumani  ty¬ 
pe  color  edge  detector  described  in  Alg.  16.2. 
CannyEdgeDetector:  Implements  the  canny  edge  detector  for  gray¬ 
scale  and  color  images  described  in  Alg.  16.3.  This  class  defines 
the  additional  methods 

ByteProcessor  getEdgeBinary () , 

List<List< j  ava . awt . Point>>  getEdgeTraces ( ) . 

Program  16.1  shows  a  complete  example  for  the  use  of  the  class 
CannyEdgeDetector  in  the  context  of  an  ImageJ  plugin. 
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Package  imagingbook . pub . color . edge. 
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import  i j . ImagePlus ; 

import  ij .plugin. filter . PluglnFilter ; 


16.5  Java 
Implementation 


import  i j . process . ByteProcessor ; 
import  i j . process . FloatProcessor ; 
import  i j . process . ImageProcessor ; 


Prog.  16.1 

Use  of  the  CannyEdgeDetector 
class  in  an  ImageJ  plugin.  A 
parameter  object  (params)  is 
created  in  line  20,  subsequently 
configured  (in  lines  22—24) 
and  finally  used  to  construct 
a  CannyEdgeDetector  object  in 
line  27.  Note  that  edge  detec¬ 
tion  is  performed  within  the 
constructor  method.  Lines  29 
33  demonstrate  how  different 
types  of  edge  detection  results 
can  be  retrieved.  The  binary 
edge  map  eBin  is  displayed  in 
line  35.  As  indicated  in  the 
setup  ()  method  (by  returning 
D0ES_ALL),  this  plugin  works 
with  any  type  of  image. 


import  imagingbook . pub . coloredge . CannyEdgeDetector ; 


import  java. awt .Point ; 
import  java. util . List ; 


public  class  Canny _Edge_Demo  implements  PluglnFilter  { 


public  int  setup (String  argO,  ImagePlus  imp)  { 


return  D0ES_ALL  +  N0_CHANGES ; 

} 


public  void  run (ImageProcessor  ip)  { 


CannyEdgeDetector . Parameters  params  = 

new  CannyEdgeDetector . Parameters () ; 


params  .gSigma  =  3.  Of;  //  a  of  Gaussian 

params .  hiThr  =  20 .  Of ;  //  20%  of  max.  edge  magnitude 

params .  loThr  =  5 .  Of ;  //  5%  of  max.  edge  magnitude 

CannyEdgeDetector  detector  = 

new  CannyEdgeDetector (ip ,  params); 

FloatProcessor  eMag  =  detector . getEdgeMagnitude () ; 
FloatProcessor  eOrt  =  detector . getEdgeOrientation () ; 
ByteProcessor  eBin  =  detector . getEdgeBinary () ; 
List<List<Point»  edgeTraces  = 
detector . getEdgeTraces () ; 

(new  ImagePlus ( "Canny  Edges",  eBin) ). show () ; 

//  process  edge  detection  results  ... 


} 


} 


411 


17 


Edge-Preserving  Smoothing  Filters 


Noise  reduction  in  images  is  a  common  objective  in  image  processing, 
not  only  for  producing  pleasing  results  for  human  viewing  but  also  to 
facilitate  easier  extraction  of  meaningful  information  in  subsequent 
steps,  for  example,  in  segmentation  or  feature  detection.  Simple 
smoothing  filters,  such  as  the  Gaussian  filter1  and  the  filters  discussed 
in  Chapter  15  effectively  perform  low-pass  filtering  and  thus  remove 
high-frequency  noise.  However,  they  also  tend  to  suppress  high-rate 
intensity  variations  that  are  part  of  the  original  signal,  thereby  de¬ 
stroying  image  structures  that  are  visually  important.  The  filters 
described  in  this  chapter  are  “edge  preserving”  in  the  sense  that  they 
change  their  smoothing  behavior  adaptively  depending  upon  the  local 
image  structure.  In  general,  maximum  smoothing  is  performed  over 
“flat”  (uniform)  image  regions,  while  smoothing  is  reduced  near  or 
across  edge-like  structures,  typically  characterized  by  high  intensity 
gradients. 

In  the  following,  three  classical  types  of  edge  preserving  filters 
are  presented,  which  are  largely  based  on  different  strategies.  The 
Kuwahara-type  filters  described  in  Sec.  17.1  partition  the  filter  ker¬ 
nel  into  smaller  sub-kernels  and  select  the  most  “homogeneous”  of 
the  underlying  image  regions  for  calculating  the  filter’s  result.  In 
contrast,  the  bilateral  filter  in  Sec.  17.2  uses  the  differences  between 
pixel  values  to  control  how  much  each  individual  pixel  in  the  filter 
region  contributes  to  the  local  average.  Pixels  which  are  similar  to 
the  current  center  pixel  contribute  strongly,  while  highly  different 
pixels  add  little  to  the  result.  Thus,  in  a  sense,  the  bilateral  filter  is  a 
non-homogeneous  linear  filter  with  a  convolution  kernel  that  is  adap¬ 
tively  controlled  by  the  local  image  content.  Finally,  the  anisotropic 
diffusion  filters  in  Sec.  17.3  iteratively  smooth  the  image  similar  to 
the  process  of  thermal  diffusion,  using  the  image  gradient  to  block 
the  local  diffusion  at  edges  and  similar  structures.  It  should  be  noted 
that  all  filters  described  in  this  chapter  are  nonlinear  and  can  be 
applied  to  either  grayscale  or  color  images. 

1  See  Chapter  5,  Sec.  5.2. 
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17.1  Kuwahara-Type  Filters 


The  filters  described  in  this  section  are  all  based  on  a  similar  concept 
that  has  its  early  roots  in  the  work  of  Kuwahara  et  al.  [144] .  Although 
many  variations  have  been  proposed  by  other  authors,  we  summarize 
them  here  under  the  term  “Kuwahara- type”  to  indicate  their  origin 
and  algorithmic  similarities. 

In  principle,  these  filters  work  by  calculating  the  mean  and  vari¬ 
ance  within  neighboring  image  regions  and  selecting  the  mean  value 
of  the  most  “homogeneous”  region,  that  is,  the  one  with  the  small¬ 
est  variance,  to  replace  the  original  (center)  pixel.  For  this  purpose, 
the  filter  region  R  is  divided  into  K  partially  overlapping  subregions 
R1,  R2, . . . ,  Rk-  At  every  image  position  (r,  v),  the  mean  fik  and  the 
variance  crk  of  each  subregion  Rk  are  calculated  from  the  correspond¬ 
ing  pixel  values  in  7  as 


Rk(Vu,v)  = 


<7k(Vu,v)  = 


1 

Rk 

1 

Rk 

l 

R 


k 


_  ^ 

•  22  I(n  +  i-  '•+./')  =  —  •  Shk(I,u,v). 

/  Ij  7* 

(i,j)£Rk 

■T,w  u+i ,  v+j)  —  nk(I,  u,  v)) 

(iJ)€Rk  n  . 

/  S?k(I,u,v)\ 

•{S2tk(I,u,v)- 


for  k  =  1, . . . ,  77,  with2 


Siik{I,u,v)  =  ^2l(u+i,v+j), 

(i,j)eRk 

S2,k(.I,u,v)  =  Yi  I2 (u+i,  V+j). 

(iJ)eRk 


(17.1) 

(17.2) 

(17.3) 

(17.4) 

(17.5) 


The  mean  (fi)  of  the  subregion  with  the  smallest  variance  (cr2)  is 
selected  as  the  update  value,  that  is, 

7/(r,'c)  fik,(u,v),  with  k'  =  argmin  crj?(7,  u,v).  (17.6) 

k=l,...,K 


The  subregion  structure  originally  proposed  by  Kuwahara  et  al. 
[144]  is  shown  in  Fig.  17.1(a)  for  a  3  x  3  filter  (r  =  1).  It  uses  four 
square  subregions  of  size  (r  +  1)  x  (r  +  1)  that  overlap  at  the  center. 
In  general,  the  size  of  the  whole  filter  is  (2 r  +  1)  x  (2 r  +  1).  This 
particular  filter  process  is  summarized  in  Alg.  17.1. 

Note  that  this  filter  does  not  have  a  centered  subregion,  which 
means  that  the  center  pixel  is  always  replaced  by  the  mean  of  one 
of  the  neighboring  regions,  even  if  it  had  perfectly  fit  the  surround¬ 
ing  values.  Thus  the  filter  always  performs  a  spatial  shift,  which 
introduces  jitter  and  banding  artifacts  in  regions  of  smooth  intensity 
change.  This  effect  is  reduced  with  the  filter  proposed  by  Tomita  and 
Tsuji  [230],  which  is  similar  but  includes  a  fifth  subregion  at  its  center 
(Fig.  17.1(b)).  Filters  of  arbitrary  size  can  be  built  by  simply  scaling 
the  corresponding  structure.  In  case  of  the  Tomita-Tsuji  filter,  the 
side  length  of  the  subregions  should  be  odd. 


Rk 
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denotes  the  size  (number  of  pixels)  of  the  subregion  Rk. 
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r3  r4 
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Fig.  17.1 

Subregion  structures  for 
Kuwahara-type  filters.  The 
orginal  Kuwahara-Hachimura 
filter  (a)  considers  four  square, 
overlapping  subregions  [144]. 
Tomita-Tsuji  filter  (b)  with 
five  subregions  (r  =  2).  The 
current  center  pixel  (red)  is 
contained  in  all  subregions. 

Das  aktuelle  Zentralpixel  (rot) 
ist  in  alien  Subregionen  enthal- 
ten. 


Note  that  replacing  a  pixel  value  by  the  mean  of  a  square  neigh¬ 
borhood  is  equivalent  to  linear  filtering  with  a  simple  box  kernel, 
which  is  not  an  optimal  smoothing  operator.  To  reduce  the  arti¬ 
facts  caused  by  the  square  subregions,  alternative  filter  structures 
have  been  proposed,  such  as  the  5x5  Nagao-Matsuyama  filter  [170] 
shown  in  Fig.  17.2. 


Rg 


r8 


r7 


Rq 


Fig.  17.2 

Subregions  for  the  5x5  (r  =  2) 
Nagao-Matsuyama  filter  [170]. 
Note  that  the  centered  subre¬ 
gion  (R± )  has  a  different  size 
than  the  remaining  subregions 

(R2 ,  •  •  •  ,  Rg)- 


If  all  subregions  are  of  identical  size 


n,  the  quantities 


•  n  =  S2fl,u,v)  -  Slfl,u,v)/n  or  (17.7) 
v)-n2  =  S2fl,  u,v)  ■  n  —  <+(/,  u,  v)  (17.8) 


can  be  used  to  measure  the  amount  of  variation  within  the  corre¬ 
sponding  subregion.  Both  expressions  require  calculating  one  mul¬ 
tiplication  less  for  each  pixel  than  the  “real”  variance  a\  in  Eqn. 
(17.3).  Moreover,  if  all  subregions  have  the  same  shape  (such  as  the 
filters  in  Fig.  17.1),  additional  optimizations  are  possible  that  sub¬ 
stantially  improve  the  performance.  In  this  case,  the  local  mean  and 
variance  need  to  be  calculated  only  once  over  a  fixed  neighborhood 
for  each  image  position.  This  type  of  filter  can  be  efficiently  imple¬ 
mented  by  using  a  set  of  pre-calculated  maps  for  the  local  variance 
and  mean  values,  as  described  in  Alg.  17.2.  As  before,  the  parameter 
r  specifies  the  radius  of  the  composite  filter,  with  subregions  of  size 
(r+l)  x  (r  +  l)  and  overall  size  (2r  +  1)  x  (2 r  +  1).  The  individual 
subregions  are  of  size  (r  +  l)  x  (r  +  l);  for  example,  r  =  2  for  the 
5x5  filter  shown  in  Fig.  17.1(b). 

All  these  filters  tend  to  generate  banding  artifacts  in  smooth  im¬ 
age  regions  due  to  erratic  spatial  displacements,  which  become  worse 
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Alg.  17.1 

Simple  Kuwahara- 
Hachimura  filter. 


1: 

KuwaharaFilter(J) 

Input:  7,  a  grayscale  image  of  size  MxN. 

Returns  a  new  (filtered)  image  of  size  MxN. 

2 

i,—i), (o,—i), (—1,0), (o,o)} 

3 

R2^~  {(0,-1),  (1,-1),  (0,0),  (1,0)} 

4 

R3^  {(0,0),  (1,0),  (1,0),  (1,1)} 

5 

R4^~  {(-1,0),  (0,0),  (-1,1),  (1,0)} 

6 

I'  4—  Duplicate(J) 

7 

(M,  N)  <-  Size(7) 

8 

for  all  image  coordinates  (u1v)^MxN  do 

9 

^min  00 

10 

for  R  4—  R1 , . . . ,  R4  do 

11 

(cr2,/r)  4—  EvalSubregion(7,  R,  u,  v) 

12 

if  a2  <  cr^in  then 

13 

2  ,  2 
^min  '  & 

14 

Rmin  ^  y 

15 

I  (it,  Zz)  4  /4min 

16 

return  I' 

17:  EvalSubregion(7,  R,  zz,  v) 

Returns  the  variance  and  mean  of  the  grayscale  image  7  for  the 

subregion  R  positioned  at  (zz,z;). 

18 

n  4—  Size(R) 

19 

Si  i —  0,  s2  <-  o 

20 

for  all  (z,  j)  G  R  do 

21 

CL  4 —  7  (zz  Z,  V  j) 

22 

Si  4—  Si  +  a 

>  Eq.  17.4 

23 

$2  4 —  $2  +  CL2 

>  Eq.  17.5 

24 

a2  4—  (R2  —  S2/n)/n  t>  variance  of  subregion  R , 

see  Eq.  17.1 

25 

fi  4—  S^/n  t>  mean  of  subregion  R, 

see  Eq.  17.3 

26 

return  (cr2,/z) 

with  increasing  filter  size.  If  a  centered  subregion  is  used  (such  as 
R5  in  Fig.  17.1  or  Rx  in  Fig.  17.2),  one  could  reduce  this  effect  by 
applying  a  threshold  (ta)  to  select  any  off-center  subregion  Rk  only 
if  its  variance  is  significantly  smaller  than  the  variance  of  the  center 
region  R1  (see  Alg.  17.2,  line  13). 


17.1.1  Application  to  Color  Images 

While  all  of  the  aforementioned  filters  were  originally  designed  for 
grayscale  images,  they  are  easily  modified  to  work  with  color  images. 
We  only  need  to  specify  how  to  calculate  the  variance  and  mean  for 
any  subregion;  the  decision  and  replacement  mechanisms  then  remain 
the  same. 

Given  an  RGB  color  image  I  =  (7R,  7G,  7B)  with  a  subregion  Rkl 
we  can  calculate  the  local  mean  and  variance  for  each  color  channel 
as 


fik(I,u,v)=  nk{IG,u,v)  , 

\Rk0 B,U,V)J 


(7k(I,U,v) 


/Vfc(/R,U,u)\ 

vIOg^v) 


(17.9) 
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2: 

3: 
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6 
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FastKuwaharaFilter(7,  r,  tCT) 

Input:  7,  a  grayscale  image  of  size  MxN ;  r,  filter  radius  ( r  >  1); 
tal  variance  threshold. 

Returns  a  new  (filtered)  image  of  size  MxN. 


(M,  N)  <-  Size(J) 
Create  maps: 

S  :  Mx  N  -A  M 
A:MxiV-^I 

^min  ^  ~~r~  2)  r 

^max  ^  ^min  ~E  ^ 


>  local  variance  S (u,v)  =  n  •  cr2(7,  r,  u) 
>  local  mean  A(r,  v)  =  /i(I,  r,  v) 

>  subregions’  left /top  position 
>  subregions’  right/bottom  position 


for  all  image  coordinates  (r,  v)  G  M  x  N  do 

(s,/r)  <—  EvalSquareSubregion(7,  r,  v,  dmin,  dmax) 

S(r,  v)  <—  s 
A(r,  v)  <—  /JL 

n  <—  (r  +  l)2  >  fixed  subregion  size 

I'  Duplicate(7) 


for  all  image  coordinates  (r,  a)  £  M  x  N  do 

smin  <—  S(u,v)  —  ta  •  n  >  variance  of  center  region 

p min  <—  A (r,  v)  >  mean  of  center  region 

for  p  i  dmin,  •  •  • ,  dmax  do 

for  Q  i  ^mini  •  •  •  5  ^max  do 

if  S(r  +  p,  i’  +  g)  <  smin  then 

Smin  +  p,V  +  q) 

Minin  A(r  +  p,  R  +  (?) 

7  (r,  a)  4  Mmin 

return  I' 


22:  EvalSquareSubregion(7,  r,  a,  dmin,  dmax) 

Returns  the  variance  and  mean  of  the  grayscale  image  7  for  a 
square  subregion  positioned  at  (r,  a). 


23 

S i  <-  0, 

S‘2 

0 

24 

for  i  dj 

nin5  •  • 

5  dmax  do 

25 

for  j  i  dmin, .  • . ,  dmax  do 

26 

CL  4 —  7  ( a 

+  i,v+j) 

27 

s  1 

<-  Si 

+  CL 

>  Eq.  17.4 

28 

s2 

<—  S2 

+  a 2 

>  Eq.  17.5 

29 

S  $2  — 

Sl/n 

>  subregion  variance  (s  =  n  •  a  ) 

30 

/i  4 —  S\  j  Tt 

>  subregion  mean  (/i) 

31 

return  (s 

17.1  Kuwahara-Type 
Filters 

Alg.  17.2 

Fast  Kuwahara-type  (Tomita- 
Tsuji)  filter  with  variable  size 
and  fixed  subregion  structure. 
The  filter  uses  five  square  sub- 
regions  of  size  (r  +  1)  X  R  +  l), 
with  a  composite  filter  of 
(2r  +  l)  X  (2r+l),  as  shown 
in  Fig.  17.1(b).  The  purpose 
of  the  variance  threshold  tCT 
is  to  reduce  banding  effects  in 
smooth  image  regions  (typi¬ 
cally  tCT  =  5,  .  .  .  ,  50  for  8-bit 
images) . 


with  /ife(),  cr|()  as  defined  in  Eqns.  (17.1)  and  (17.3),  respectively. 
Analogous  to  the  grayscale  case,  each  pixel  is  then  replaced  by  the 
average  color  in  the  subregion  with  the  smallest  variance,  that  is, 

l'(u,  v)  <—  nk,  (/,  r,  a),  with  k!  =  argmin  rqB(^,  u->  v)-  (17.10) 

k=l,...,K 

The  overall  variance  cr2  RGB,  used  1°  determine  k!  in  Eqn.  (17.10),  can 
be  defined  in  different  ways,  for  example,  as  the  sum  of  the  variances 
in  the  individual  color  channels,  that  is, 


Tr-GbCC^L)  =  al(IR,u,v)  +  a2k{IG,u,v)  +  al(IB,u,v).  (17.11) 
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Alg.  17.3 

Color  version  of  the 
Kuwahara-type  filter  (adapted 
from  Alg.  17.1).  The  algo¬ 
rithm  uses  the  definition  in 
Eqn.  (17.11)  for  the  total  vari- 

o 

ance  a  in  the  subregion  R 
(see  line  25).  The  vector  pi 
(calculated  in  line  26)  is  the 
average  color  of  the  subregion. 


1: 

KuwaharaFilterColor(I) 

Input:  /,  an  RGB  image  of  size  MxN. 

Returns  a  new  (filtered)  color  image  of  size  MxN. 

2 

Ri<r-  K-i.-i), (o,-i), (-1,0),  (0,0)} 

3 

R2^~  {(0,-1),  (1,-1),  (0,0),  (1,0)} 

4 

R3^  {(0,0),  (1,0),  (1,0),  (1,1)} 

5 

R4^~  {(-1,0),  (0,0),  (—1,1),  (1,0)} 

6 

I'  G-  Duplicate(J) 

7 

(M,  N)  <-  Size)/) 

8 

for  all  image  coordinates  ( u1v)gMxN  do 

9 

^min  00 

10 

for  R  G-  Ri , . . . ,  R4  do 

11 

(cr2,/r)  V-  EvalSubregion (/,  Rk,  u,  v) 

12 

if  cr2  <  cr^in  then 

13 

2  ,  2 
^min  '  & 

14 

A^min  ^  R 

15 

I'{u,v)  <-  nmin 

16 

return  I' 

17:  EvalSubregion  (I,R,u,v) 

Returns  the  total  variance  and  the  mean  vector  of  the  color  image 

/  for  the  subregion  R  positioned  at  (u,v). 

18 

n  G-  Siz e(R) 

19 

S1  <T-  0,  S2  <-  0 

>  SUS2  €  K3 

20 

for  all  (i,j)  E  R  do 

21 

a  G-  I(u+i,  v-\-j) 

D>  a  E  M3 

22 

S  i  e —  S  i  -|-  a 

23 

S  2  —  S  2  H-  cl  \>  a  — 

a  •  a  (dot  product) 

24 

S-s-  (S2  -  Si  -  ±)  •  i 

V  "  ±  n  /  n 

\>  S  =  ((Jr,  Cq^b) 

25 

^RGB  Z-S  >  ^RGB  —  ^R  +  ^G  +  ^B) 

total  variance  in  R 

26 

/j,  G-  —  •  Si  t>  /x  E  M  ,  avg.  color  vector  for  subregion  R 

27 

return  (<t|g b.m) 

This  is  sometimes  called  the  “total  variance”.  The  resulting  filter 
process  is  summarized  in  Alg.  17.3  and  color  examples  produced  with 
this  algorithm  are  shown  in  Figs.  17.3  and  17.4. 

Alternatively  [109],  one  could  define  the  combined  color  variance 
as  the  norm  of  the  color  covariance  matrix 3  for  the  subregion  Rkl 


£k(!,u,v) 


(ak,RR  ak,RG  ak,RB  \ 
ak,GR  ak,GG  ak,G B  I  5 
ak,BR  ak,BG  ak,BB  / 


(17.12) 


with 


® k,pq 


•F  [Ip(u+i’v+v-ij'k(ip’u’v). 

(i,j)£Rk 

Iq{u  +  i,  V+j)~Hk(Iq,  U,  t>)]  , 


(17.13) 


for  all  possible  color  pairs  (p.  q)  s  {R,  G,  B}2.  Note  that  ak  =  cr2 
and  (Tk  =  crk  ,  and  thus  the  matrix  Ek  is  symmetric  and  only  6 
of  its  9  entries  need  to  be  calculated.  The  (Frobenius)  norm  of  the 
3x3  color  covariance  matrix  is  defined  as 


See  Sec.  D.2  in  the  Appendix  for  details. 
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3 


(b)  r  =  1  (3x3  filter) 


(d)  r  =  3  (7  X  7  filter) 


17.1  Kuwahara-Type 
Filters 


Fig.  17.3 

Kuwahara-type  ( Tomita-  Tsuji ) 
filter — color  example  using 
the  variance  definition  in  Eqn. 
(17.11).  The  filter  radius  is 
varied  from  r  =  1  (b)  to  r  = 

4  (e). 


ak}  RGB 


Zk(I,u,v) 


2 

2 


=  HKp«)2 

p,q  € 

{R.g.b} 


(17.14) 


Note  that  the  total  variance  in  Eqn.  (17.11) — which  is  simpler  to 
calculate  than  this  norm — is  equivalent  to  the  trace  of  the  covariance 
matrix  Ek. 
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Fig.  17.4 

Color  versions  of  the  Tomita- 
Tsuji  (Fig.  17.1(b))  and 
Nagao- Matsuyama  filter  (Fig. 
17.2).  Both  filters  are  of  size 
5x5  and  use  the  variance  defi¬ 
nition  in  Eqn.  (17.11).  Results 
are  visually  similar,  but  in  gen¬ 
eral  the  Nagao -Matsuyama 
filter  is  slightly  less  destruc¬ 
tive  on  diagonal  structures. 
Original  image  in  Fig.  17.3(a). 


(a)  5x5  Tomita-Tsuji  filter  (r  =  2) 


(b)  5x5  Nagao -Matsuyama  filter 


Since  each  pixel  of  the  filtered  image  is  calculated  as  the  mean 
(i.e.,  a  linear  combination)  of  a  set  of  original  color  pixels,  the  results 
depend  on  the  color  space  used,  as  discussed  in  Chapter  15,  Sec. 
15.1.2. 


17.2  Bilateral  Filter 

Traditional  linear  smoothing  filters  operate  by  convolving  the  image 
with  a  kernel,  whose  coefficients  act  as  weights  for  the  corresponding 
image  pixels  and  only  depend  on  the  spatial  distance  from  the  center 
coordinate.  Pixels  close  to  the  filter  center  are  typically  given  larger 
weights  while  pixels  at  a  greater  distance  carry  smaller  weights.  Thus 
the  convolution  kernel  effectively  encodes  the  closeness  of  the  under¬ 
lying  pixels  in  space.  In  the  following,  a  filter  whose  weights  depend 
only  on  the  distance  in  the  spatial  domain  is  called  a  domain  filter. 

To  make  smoothing  filters  less  destructive  on  edges,  a  typical 
strategy  is  to  exclude  individual  pixels  from  the  filter  operation  or 
to  reduce  the  weight  of  their  contribution  if  they  are  very  dissimilar 
in  value  to  the  pixel  found  at  the  center  position.  This  operation 
too  can  be  formulated  as  a  filter,  but  this  time  the  kernel  coefficients 
depend  only  upon  the  differences  in  pixel  values  or  range.  Therefore 
this  is  called  a  range  filter ,  as  explained  in  more  detail  Sec.  17.2.2. 
The  idea  of  the  bilateral  filter,  proposed  by  Tomasi  and  Manduchi 
in  [229],  is  to  combine  both  domain  and  range  filtering  into  a  common, 
edge-preserving  smoothing  filter. 


17.2. 1  Domain  Filter 


In  an  ordinary  2D  linear  filter  (or  “convolution”)  operation/ 


See  also  Chapter  5,  Eqn.  (5.5)  on  page  92. 
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17.2  Bilateral  Filter 


oo  oo 


I'(u,  V )  <-  EE  I(u  +  m,v  +  n)  •  H(m,n)  (17.15) 

m=  n  — 

—  oo  — oo 

oo  oo 

=  (17.16) 

i=  j  = 

—  oo  — oo 


every  new  pixel  value  //(r,  v)  is  the  weighted  average  of  the  original 
image  pixels  I  in  a  certain  neighborhood,  with  the  weights  speci¬ 
fied  by  the  elements  of  the  filter  kernel  H .5 *  The  weight  assigned  to 
each  pixel  only  depends  on  its  spatial  position  relative  to  the  current 
center  coordinate  (u,v).  In  particular,  if (0,0)  specifies  the  weight 
of  the  center  pixel  I(u,v),  and  if(m,  n)  is  the  weight  assigned  to  a 
pixel  displaced  by  (m,  n)  from  the  center.  Since  only  the  spatial  im¬ 
age  coordinates  are  relevant,  such  a  filter  is  called  a  domain  filter. 
Obviously,  ordinary  filters  as  we  know  them  are  all  domain  filters. 

17.2.2  Range  Filter 

Although  the  idea  may  appear  strange  at  first,  one  could  also  apply 
a  linear  filter  to  the  pixel  values  or  range  of  an  image  in  the  form 

oo  oo 

I'(u,  v)  <-  E  E  i)  ■ H r  (AW)  -  A u . v ))  •  (17-17) 

i=  j  = 

—  oo  — oo 

The  contribution  of  each  pixel  is  specified  by  the  function  Hr  and 
depends  on  the  difference  between  its  own  value  I(i,j)  and  the  value 
at  the  current  center  pixel  I(u,v).  The  operation  in  Eqn.  (17.17) 
is  called  a  range  filter ,  where  the  spatial  position  of  a  contributing 
pixel  is  irrelevant  and  only  the  difference  in  values  is  considered.  For 
a  given  position  (r,v),  all  surrounding  image  pixels  I(i,j)  with  the 
same  value  contribute  equally  to  the  result  I'r(u,v).  Consequently, 
the  application  of  a  range  filter  has  no  spatial  effect  upon  the  image — 
in  contrast  to  a  domain  filter,  no  blurring  or  sharpening  will  occur. 
Instead,  a  range  filter  effectively  performs  a  global  point  operation  by 
remapping  the  intensity  or  color  values.  However,  a  global  range  filter 
by  itself  is  of  little  use,  since  it  combines  pixels  from  the  entire  image 
and  only  changes  the  intensity  or  color  map  of  the  image,  equivalent 
to  a  nonlinear,  image-dependent  point  operation. 

17.2.3  Bilateral  Filter — General  Idea 

The  key  idea  behind  the  bilateral  filter  is  to  combine  domain  filtering 
(Eqn.  (17.16))  and  range  filtering  (Eqn.  (17.17))  in  the  form 

^  CO  oo 

I'(u,v)  =  — - EE7^') '  Hd(i-u,j-v)  ■  Ht(l(i,j)-I(u,v)), 

™U,V  ■  ■  _  v - - - ' 

L  ~  J  ~  7/1 .  . 

-00  -00  wi,3  (17.18) 

5  In  Eqn.  (17.16),  functions  I  and  H  are  assumed  to  be  zero  outside  their 

domains  of  definition. 
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where  Hd ,  Hr  are  the  domain  and  range  kernels,  respectively,  wij 
are  the  composite  weights,  and 

oo  oo  oo  oo 

wu,v  =  EEv  =EEW Ai-nJ-v)-  )) 

i=  3=  i  =  j  = 

—  oo  — oo  — oo  — oo 

is  the  (position-dependent)  sum  of  the  weights  wi^  used  to  normalize 
the  combined  filter  kernel. 

In  this  form,  the  scope  of  range  filtering  is  constrained  to  the  spa¬ 
tial  neighborhood  defined  by  the  domain  kernel  Hd.  At  a  given  filter 
position  (u,v),  the  weight  wi  j  assigned  to  each  contributing  pixel 
depends  upon  (1)  its  spatial  position  relative  to  (iqt?),  and  (2)  the 
similarity  of  its  pixel  value  to  the  value  at  the  center  position  (r,  v). 
In  other  words,  the  resulting  pixel  is  the  weighted  average  of  pixels 
that  are  nearby  and  similar  to  the  original  pixel.  In  a  flat  image  re¬ 
gion,  where  most  surrounding  pixels  have  values  similar  to  the  center 
pixel,  the  bilateral  filter  acts  as  a  conventional  smoothing  filter,  con¬ 
trolled  only  by  the  domain  kernel  Hd.  However,  when  placed  near  a 
step  edge  or  on  an  intensity  ridge,  only  those  pixels  are  included  in 
the  smoothing  process  that  are  similar  in  value  to  the  center  pixel, 
thus  avoiding  blurring  the  edges. 

If  the  domain  kernel  Hd  has  a  limited  radius  T>,  or  size  (2D+1)  x 
(279+1),  the  bilateral  filter  defined  in  Eqn.  (17.18)  can  be  written  as 


(17.19) 


I'(u,  v) 


u-\-D  v-\-D 

E  E  J(*, 3)  •  Hd(i-u,j-v)  ■  Hr  (. I(i,j)-I(u,v )) 

i=  j  = 
u  —  D  v  —  D 

u-\-D  v-\-D 

E  E  Hd(i-u,j-v)  ■  Hr  ( I(i,j)-I(u,v )) 

i=  j=  (17.20) 

u—D  v  —  D 


D  D 

EE  I(u  +  m,  v  +  n)  •  Hd(m,  n)  •  HT  (/(u  +  m,  v  +  n)  —  I(u,v)) 

m=  n— 

-D  -D 

D  D 

EE  Hd  (m,  n)  •  Hr  (/(u  +  m,  — 


m—  n= 
-D  -D 


(17.21) 


(by  substituting  ( i  —  u )  — )•  m  and  (j  —  v)  -D  n).  The  effective,  space 
variant  filter  kernel  for  the  image  I  at  position  (r,  v)  then  is 

fr  (■  -\_  ■  Hr(l(u+i,v+j)-I(u,v)) 

Hd(m,n)  •  Hy  (I(u  +  m,v  +  n)  —  I(u,v)) 

m=  n— 

~D  ~D  (17.22) 


for  —  D  <i,j  <  D,  whereas  HI  u  v(i,j )  =  0  otherwise.  This  quantity 
specifies  the  contribution  of  the  original  image  pixels  /(r-H,  v+j)  to 
the  resulting  new  pixel  value  I'(u,v). 
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17.2  Bilateral  Filter 


A  special  (but  common)  case  is  the  use  of  Gaussian  kernels  for  both 
the  domain  and  the  range  parts  of  the  bilateral  filter.  The  discrete 
2D  Gaussian  domain  kernel  of  width  ad  is  defined  as 


H 


G ,  Ti 


(ra,  n)  = 


1 


2oy 


d  _ 


27T(Jd2 


1 


exp 


7 r  a. 


1 


27r<Jd 


1 


V2 


2,  2 
m.  ti 

2o? 


exp 


7T  CL, 


(17.23) 

(17.24) 


for  ra,  n  E  Z.  It  has  its  maximum  at  the  center  (m  =  n  =  0) 
and  declines  smoothly  and  isotropically  with  increasing  radius  p  = 
\/m2  +  n2;  for  p  >  3.5crd,  i7^,crd(ra,  n)  is  practically  zero.  The  fac¬ 
torization  in  Eqn.  (17.24)  indicates  that  the  Gaussian  2D  kernel  can 
be  separated  into  ID  Gaussians,  allowing  for  a  more  efficient  im¬ 
plementation.6  The  constant  factors  1/(v/27t  crd)  can  be  omitted  in 
practice,  since  the  bilateral  filter  requires  individual  normalization  at 
each  image  position  (Eqn.  (17.19)). 

Similarly,  the  corresponding  range  filter  kernel  is  defined  as  a 
(continuous)  1 D  Gaussian  of  width  crr, 


(17.25) 


for  i6l.  The  constant  factor  l/(vTr  <rr)  may  again  be  omitted  and 
the  resulting  composite  filter  (Eqn.  (17.18))  can  thus  be  written  as 


1 


u+D  v-\-D 


I\u,v)  =  Z  yih  3)  •  h {i  -  u,  j  -  v) 

U,V  i  =  3  =  n 

u-D  v-D  .  -  I(U,V)) 


(17.26) 


1 


D  D 


J2[l(u  +  m,v  +  n)-H^(m,n)  (17.27) 

m=  n—  n 

~D  ~D  •  Ht  ,0r  (I(u  +  ra,  v  +  n)  —  /(r,  v)) 

D  D 

y^  ^/(r  +  ra,  u  +  n)  -  exp  (—  ?n9 )  (17.28) 


l 


u,v 


m=  n  — 

-D  -D 


exp(— 


2  a? 


) 


with  D  =  [3.5  •  crd]  and 

D  D 

Wn 


u,v 


E Eexp(~5^)  - exp(- ).  (17.29) 


m  =  n  = 

—  D  — D 


For  8-bit  grayscale  images,  with  pixel  values  in  the  range  [0,  255],  the 
width  of  the  range  kernel  is  typically  set  to  crY  =  10, . . . ,  50.  The  width 
of  the  domain  kernel  (<rd)  depends  on  the  desired  amount  of  spatial 
smoothing.  Algorithm  17.4  gives  a  summary  of  the  steps  involved  in 
bilateral  filtering  for  grayscale  images. 

See  also  Chapter  5,  Sec.  5.3.3. 
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Alg.  17.4 

Bilateral  filter  with  Gaussian 
kernels  (grayscale  version). 


1: 


2: 

3: 

4: 

5: 

6: 

7: 

8: 

9: 

10: 

11: 

12: 

13: 

14: 

15: 

16: 

17: 

18: 


BilateralFilterGray(7,  crd,  crr) 

Input:  7,  a  grayscale  image  of  size  MxN ;  crd,  width  of  the  2D 
Gaussian  domain  kernel;  ar,  width  of  the  ID  Gaussian  range 
kernel.  Returns  a  new  filtered  image  of  size  MxN. 


(M,  N)  <-  Size(J) 

D  [3.5  •  crd]  >  width  of  domain  filter  kernel 

I'  <—  Duplicate^) 


for  all  image  coordinates  (u,  v ) 
S'  ^  0 
EE  <-  0 
a  <—  I (u,  v ) 

for  rn  i - 7), . . . ,  D  do 

for  n  i - D , . . . ,  D  do 

b  4 —  7  {u  +  771,  v  ~b  n) 

wd  <-  exp(—  "*E"~ ) 

d 

/  ( a  —  b )2  \ 

WY  i  exp  [  “2aJ~  / 

W  <—  Wd  •  Wr 

S  S  +  w  •  5 
VE  <-  VE  +  tc 
^  S/VE 

return  /' 


G  MxN  do 

D>  sum  of  weighted  pixel  values 

>  sum  of  weights 

>  center  pixel  value 

>  off-center  pixel  value 

>  domain  coefficient 

>  range  coefficient 

>  composite  coefficient 


Figures  17.5-17.9  show  the  effective,  space-variant  filter  kernels 
(see  Eqn.  (17.22))  and  the  results  of  applying  a  bilateral  filter  with 
Gaussian  domain  and  range  kernels  in  different  situations.  Uniform 
noise  was  applied  to  the  original  images  to  demonstrate  the  filtering 
effect.  One  can  see  clearly  how  the  range  part  makes  the  combined 
filter  kernel  adapt  to  the  local  image  structure.  Only  those  surround¬ 
ing  parts  that  have  brightness  values  similar  to  the  center  pixel  are 
included  in  the  filter  operation.  The  filter  parameters  were  set  to 
crd  =  2.0  and  crr  =  50;  the  domain  kernel  is  of  size  15  x  15. 

17.2.5  Application  to  Color  Images 

Linear  smoothing  filters  are  typically  used  on  color  images  by  sepa¬ 
rately  applying  the  same  filter  to  the  individual  color  channels.  As 
discussed  in  Chapter  15,  Sec.  15.1,  this  is  legitimate  if  a  suitable  work¬ 
ing  color  space  is  used  to  avoid  the  introduction  of  unnatural  intensity 
and  chromaticity  values.  Thus,  for  the  domain-part  of  the  bilateral 
filter,  the  same  considerations  apply  as  for  any  linear  smoothing  fil¬ 
ter.  However,  as  will  be  described,  the  bilateral  filter  as  a  whole 
cannot  be  implemented  by  filtering  the  color  channels  separately. 

In  the  range  part  of  the  filter,  the  weight  assigned  to  each  con¬ 
tributing  pixel  depends  on  its  difference  to  the  value  of  the  center 
pixel.  Given  a  suitable  distance  measure  dist(a,  b )  between  two  color 
vectors  a,  5,  the  bilateral  filter  in  Eqn.  (17.18)  can  be  easily  modified 
for  a  color  image  I  to 

^  oo  oo 

I\u,v)  =  — - '  Hd(i  —  u,j  —  v)  (17.30) 

*  *  'LL  V 

—  oo  —oo  ■  ffr(dist(J(i,  j),I(u,v))), 
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17.2  Bilateral  Filter 


Fig.  17.5 

Bilateral  filter  response  when 
positioned  in  a  flat,  noisy  im¬ 
age  region.  Original  image 
function  (b),  filtered  image  (c), 
combined  impulse  response  (a) 
of  the  filter  at  the  given  posi¬ 
tion. 


Fig.  17.6 

Bilateral  filter  response  when 
positioned  on  a  linear  ramp. 
Original  image  function  (b), 
filtered  image  (c),  combined 
impulse  response  (a)  of  the 
filter  at  the  given  position. 


with 

WU:V  =J2Hd(i-u,j-v)  ■  Hr (dist (I(i,j),I(u,v))).  (17.31) 

h3 


It  is  common  to  use  one  of  the  popular  norms  for  measuring  color 
distances,  such  as  the  L1?  L2  (Euclidean),  or  the  (maximum) 
norms,  for  example, 
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Fig.  17.7 

Bilateral  filter  response  when 
positioned  left  to  a  verti¬ 
cal  step  edge.  Original  im¬ 
age  function  (b),  filtered 
image  (c),  combined  im¬ 
pulse  response  (a)  of  the 
filter  at  the  given  position. 


Fig.  17.8 

Bilateral  filter  response  when 
positioned  right  to  a  verti¬ 
cal  step  edge.  Original  im¬ 
age  function  (b),  filtered 
image  (c),  combined  im¬ 
pulse  response  (a)  of  the 
filter  at  the  given  position. 
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dist1(a,  b ) 
dist2(a,  b ) 
distoo  (a,b) 


|  a  —  b  | 

•  a  —  b 


bk  i 

( ak  ~  bk)2)  ! 


a  —  b 


oo 


max  |  ak  —  bk 

k 


(17.32) 

(17.33) 

(17.34) 


The  normalizing  factors  1/3  and  l/\/3  in  Eqns.  (17.32)-(17.33)  are 
necessary  to  obtain  results  comparable  in  magnitude  to  those  of 


17.2  Bilateral  Filter 


Fig.  17.9 

Bilateral  filter  response  when 
positioned  at  a  corner.  Origi¬ 
nal  image  function  (b),  filtered 
image  (c),  combined  impulse 
response  (a)  of  the  filter  at  the 
given  position. 


Fig.  17.10 

Bilateral  filter  response  when 
positioned  on  a  vertical  ridge 
Original  image  function  (b), 
filtered  image  (c),  combined 
impulse  response  (a)  of  the 
filter  at  the  given  position. 


grayscale  images  when  using  the  same  range  kernel  HTJ  Of  course 
in  most  color  spaces,  none  of  these  norms  measures  perceived  color 
difference.7 8  However,  the  distance  function  itself  is  not  really  critical 
since  it  only  affects  the  relative  weights  assigned  to  the  contributing 

7  For  example,  with  8-bit  RGB  color  images,  dist(a,  b)  is  always  in  the 
range  [0,255]. 

8  The  CIELAB  and  CIELUV  color  spaces  are  designed  to  use  the  Eu¬ 
clidean  distance  (L2  norm)  as  a  valid  metric  for  color  difference. 
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Alg.  17.5 

Bilateral  filter  with  Gaus¬ 
sian  kernels  (color  version). 
The  function  dist(a,  b )  mea¬ 
sures  the  distance  between 
two  colors  a  and  6,  for  ex¬ 
ample,  using  the  L2  norm 
(line  5,  see  Eqns.  (17.32)— 
(17.34)  for  other  options). 
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1: 


2: 

3: 

4: 

5: 

6: 

7: 

8: 

9: 

10: 

11: 

12: 

13: 

14: 

15: 

16: 

17: 

18: 

19: 


BilateralFilterColor(/,  crd,  crr) 

Input:  J,  a  color  image  of  size  M  x  TV;  crd,  width  of  the  2D 
Gaussian  domain  kernel;  <rr,  width  of  the  ID  Gaussian  range 
kernel.  Returns  a  new  filtered  color  image  of  size  M  x  N. 


(M,  N)  <-  Size(J) 

D  <—  [3.5  •  <rd] 

I'  <—  Duplicate(J) 

dist(a,  b)  :=  •  ||a  —  6 


>  width  of  domain  filter  kernel 


>  color  distance  (e.g.,  Euclidean) 


for  all  image  coordinates  (u,  v )  G  (M  x  N)  do 

S  0  >  5  G  1  ,  sum  of  weighted  pixel  vectors 

VE  0  >  sum  of  pixel  weights  (scalar) 

a  <—  I (u,v)  >  a  G  M3,  center  pixel  vector 


for  m  i - D, . . . ,  D  do 

for  n  < - D1 . . . ,  D  do 

b  I(u  +  m,  v  +  n)  >  b  G  M3, 
wd  -f-  exp(-m^+V) 

Wr^exp(_(™if) 


off-center  pixel  vector 
>  domain  coefficient 

>  range  coefficient 


W  Wd  ■  Wr 


S  <r-  S  +  w  •  b 
W  <-  W  +  w 
I'{u,v)  <-  i  •  S 


>  composite  coefficient 


return  I' 


color  pixels.  Regardless  of  the  distance  function  used,  the  resulting 
chromaticities  are  linear,  convex  combinations  of  the  original  colors 
in  the  filter  region,  and  thus  the  choice  of  the  working  color  space  is 
more  important  (see  Chapter  15,  Sec.  15.1). 

The  process  of  bilateral  filtering  for  color  images  (again  using 
Gaussian  kernels  for  the  domain  and  the  range  filters)  is  summarized 
in  Alg.  17.5.  The  Euclidean  distance  (L2  norm)  is  used  to  measure 
the  difference  between  color  vectors.  The  examples  in  Fig.  17.11  were 
produced  using  sRGB  as  the  color  working  space. 


17.2.6  Efficient  Implementation  by  x/y  Separation 


The  bilateral  filter,  if  implemented  in  the  way  described  in  Algs. 
17.4-17.5,  is  computationally  expensive,  with  a  time  complexity  of 
0(K2)  for  each  pixel,  where  K  denotes  the  radius  of  the  filter.  Some 
mild  speedup  is  possible  by  tabulating  the  domain  and  range  kernels, 
but  the  performance  of  the  brute-force  implementation  is  usually  not 
acceptable  for  practical  applications.  In  [185]  a  separable  approxima¬ 
tion  of  the  bilateral  filter  is  proposed  that  brings  about  a  significant 
performance  increase.  In  this  implementation,  a  ID  bilateral  filter 
is  first  applied  in  the  horizontal  direction  only,  which  uses  ID  do¬ 
main  and  range  kernels  hd  and  ftr,  respectively,  and  produces  the 
intermediate  image  P ,  that  is, 


17.2  Bilateral  Filter 


/^(r,  v) 


d 

Y  I(u  +  m,v)  •  hd{m)  •  /ir(/(R  +  ra,  v)  —  /(r,  r)) 

m—  —  D 


D 

E  G  (m)  •  ftr(/(u  +  m,  r)  —  /(r,  r)) 

m—  —  D 


(17.35) 


In  the  second  pass,  the  same  filter  is  applied  to  the  intermediate 
result  />  in  the  vertical  direction  to  obtain  the  final  result  I'  as 


/7(r,  r) 


D 

Y  R  +  n)  •  hd(ri)  '  (7>(ie,  R-f-n)  — /^(r,  r)) 

n—  —  D 


E  ^d(n)  •  MtRw.v+n) -/*(«,?>))  (17.36) 

n=  —  D 


Fig.  17.11 

Bilateral  filter — color  example. 
A  Gaussian  kernel  with  crd  = 
2.0  (kernel  size  15  X  15)  is 
used  for  the  domain  part  of  the 
filter;  working  color  space  is 
sRGB.  The  width  of  the  range 
filter  is  varied  from  crr  —  10  to 
100.  The  filter  was  applied  in 
sRGB  color  space. 


for  all  (r,  r),  using  the  same  ID  domain  and  range  kernels  hd  and  hr, 
respectively,  as  in  Eqn.  (17.35). 
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For  the  horizontal  part  of  the  filter,  the  effective  space-variant 
kernel  at  image  position  (u,  v )  is 


,u,v 


hd(i)  •  hY  (I(u+i,  v )  —  7(r,  v)) 


hd(i)  •  hY  (I(u  +  m,v)  -  I(u,v)) 

m—  —  D 


(17.37) 


for  —  D  <  i  <  D  (zero  otherwise).  Analogously,  the  effective  kernel 
for  the  vertical  part  of  the  filter  is 


h 


v 

I,u,v 


hd(i)  ■  hx  (I(u,v+j)  -  I(u,v)) 

D 

E  hdU)  ■  K  (i(u,v+j)  -  i(u,v)) 

n=  —  D 


(17.38) 


again  for  —D  <  3  <  D.  For  the  combined  filter,  the  effective  2D 
kernel  at  position  (r,  v)  then  is 


H-I,U,V  (b  J  ) 


0 


,n,v 


(i) 


for  — D  <i,j  <  D, 
otherwise, 


(17.39) 


where  I  is  the  original  image  and  />  denotes  the  intermediate  image, 
as  defined  in  Eqn.  (17.35). 

Alternatively,  the  vertical  filter  could  be  applied  first,  followed  by 
the  horizontal  filter.  Algorithm  17.6  shows  a  direct  implementation 
of  the  separable  bilateral  filter  for  grayscale  images,  using  Gaussian 
kernels  for  both  the  domain  and  the  range  parts  of  the  filter.  Again, 
the  extension  to  color  images  is  straightforward  (see  Eqn.  (17.31)  and 
Exercise  17.3). 

As  intended,  the  advantage  of  the  separable  filter  is  performance. 
For  a  given  kernel  radius  D,  the  original  (non-separable)  requires 
0(D2)  calculations  for  each  pixel,  while  the  separable  version  takes 
only  0(D)  steps.  This  means  a  substantial  saving  and  speed  increase, 
particularly  for  large  filters. 

Figure  17.12  shows  the  response  of  the  ID  separable  bilateral  fil¬ 
ter  in  various  situations.  The  results  produced  by  the  separable  filter 
are  very  similar  to  those  obtained  with  the  original  filter  in  Figs. 
17.5-17.9,  partly  because  the  local  structures  in  these  images  are 
parallel  to  the  coordinate  axes.  In  general,  the  results  are  different, 
as  demonstrated  for  a  diagonal  step  edge  in  Fig.  17.13.  The  effective 
filter  kernels  are  shown  in  Fig.  17.13(g,  h)  for  an  anchor  point  posi¬ 
tioned  on  the  bright  side  of  the  edge.  It  can  be  seen  that,  while  the 
kernel  of  the  full  filter  Fig.  17.13(g)  is  orientation-insensitive,  the  up¬ 
per  part  of  the  separable  kernel  is  clearly  truncated  in  Fig.  17.13(h). 
But  although  the  separable  bilateral  filter  is  sensitive  to  local  struc¬ 
ture  orientation,  it  performs  well  and  is  usually  a  sufficient  substitute 
for  the  non-separable  version  [185].  The  color  examples  shown  in  Fig. 
17.14  demonstrate  the  effects  of  ID  bilateral  filtering  in  the  x-  and  y- 
directions.  Note  that  the  results  are  not  exactly  the  same  if  the  filter 
is  first  applied  in  the  x-  or  in  ^-direction,  but  usually  the  differences 
are  negligible. 
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Fig.  17.12 

Response  of  a  separable  bilat¬ 
eral  filter  in  various  situations. 
Effective  kernel  HIuv  (Eqn. 
(17.39))  at  the  center  pixel  (a— 
e),  original  image  data  (f— j), 
filtered  image  data  (k— o).  Set¬ 
tings  are  the  same  as  in  Figs. 
17.5-17.9. 


Original  image 


Full  bilateral  filter 


Separable  version 


Fig.  17.13 

Bilateral  filter — full  vs.  sepa¬ 
rable  version.  Original  image 
(a)  and  enlarged  detail  (d). 
Results  of  the  full  bilateral 
filter  (b,  e)  and  the  separable 
version  (c,  f).  The  correspond¬ 
ing  local  filter  kernels  for  the 
center  pixel  (positioned  on  the 
bright  side  of  the  step  edge) 
for  the  full  filter  (g)  and  the 
separable  version  (h).  Note 
how  the  upper  part  of  the  ker¬ 
nel  in  (h)  is  truncated  along 
the  horizontal  axis,  which 
shows  that  the  separable  fil¬ 
ter  is  orientation-sensitive.  In 
both  cases,  crd  —  2.0,  crY  =  25. 
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Alg.  17.6 

Separable  bilateral  filter  with 
Gaussian  kernels  (adapted 
from  Alg.  17.4).  The  input  im¬ 
age  is  processed  in  two  passes. 
In  each  pass,  a  ID  kernel  is 
applied  in  horizontal  or  ver¬ 
tical  direction,  respectively 
(see  Eqns.  (17.35)-(17.36)). 
Note  that  results  of  the  sep¬ 
arable  filter  are  similar  (but 
not  identical)  to  the  full  (2D) 
bilateral  filter  in  Alg.  17.4. 
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1: 

BilateralFilterGraySeparable(7,  ad 

Gr  ) 

Input:  7,  a  grayscale  image  of 

size  M  x  iV;  crd,  width  of  the  2D 

Gaussian  domain  kernel;  crr, 

width  of  the  ID  Gaussian  range 

kernel.  Returns  a  new  filtered 

image  of  size  M  x  N. 

2 

(M,  N)  <-  Size(7) 

3 

D  i—  [3.5  •  <rd] 

>  width  of  domain  filter  kernel 

4 

7>  i—  Duplicate(7) 

Pass  1  (horizontal): 

5 

for  all  coordinates  (u,v)  £  M  x  N  do 

6 

a  V-  I(u,  v ) 

7 

S<r~  0,  W  <r~  0 

8 

for  rn  < - 7), . . . ,  D  do 

9 

b  V-  I(u  +  m,  v) 

10 

wd<— exp(  |p) 

d 

>  domain  kernel  coeff.  hd(m) 

11 

(  ( a  —  b )2  \ 

wr  <-  exp(  2y  ) 

>  range  kernel  coeff.  hr(b) 

12 

W  1—  Wd  •  Wr 

>  composite  filter  coeff. 

13 

S  1—  S  +  w  •  b 

14 

W  ^  VF  -hw 

15 

0(u,v)  <-  S/W 

D>  see  Eq.  17.35 

16 

I'  Duplicate(7) 

Pass  2  (vertical): 

17 

for  all  coordinates  (u,v)  £  M  x  TV  do 

18 

a  7 >(u,  v ) 

19 

Sir-  0,  W  1-  0 

20 

for  n  1 - D1 . . . ,  D  do 

21 

b  1 —  7^  (r,  v  -p  n) 

22 

wd<-ex  p( 

d 

>  domain  kernel  coeff.  T7d(n) 

23 

(  ( a  —  b )2  \ 

Wr  exp(  2ct?;  ) 

>  range  kernel  coeff.  HY(b ) 

24 

W  Wd  •  Wr 

>  composite  filter  coeff. 

25 

S  S  +  w  •  b 

26 

W  +  w 

27 

I'(u,v)  <-  S/W 

\>  see  Eq.  17.36 

28 

return  I' 

17.2.7  Further  Reading 

A  thorough  analysis  of  the  bilateral  filter  as  well  as  its  relationship 
to  adaptive  smoothing  and  nonlinear  diffusion  can  be  found  in  [16] 
and  [67].  In  addition  to  the  simple  separable  implementation  de¬ 
scribed,  several  other  fast  versions  of  the  bilateral  filter  have  been 
proposed.  For  example,  the  method  described  in  [65]  approximates 
the  bilateral  filter  by  filtering  sub-sampled  copies  of  the  image  with 
discrete  intensity  kernels  and  recombining  the  results  using  linear 
interpolation.  An  improved  and  theoretically  well-grounded  version 
of  this  method  was  presented  in  [179].  The  fast  technique  proposed 
in  [253]  eliminates  the  redundant  calculations  performed  in  partly 
overlapping  image  regions,  albeit  being  restricted  to  the  use  of  box¬ 
shaped  domain  kernels.  As  demonstrated  in  [187,259],  real-time  per¬ 
formance  using  arbitrary-shaped  kernels  can  be  obtained  by  decom¬ 
posing  the  filter  into  a  set  of  smaller  spatial  filters. 


17.3  Anisotropic 
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Fig.  17.14 

Separable  bilateral  filter  (color 
example).  Original  image  (a), 
bilateral  filter  applied  only  in 
the  cc-direction  (b)  and  only  in 
the  ^-direction  (c).  Result  of 
applying  the  full  bilateral  filter 
(d)  and  the  separable  bilateral 
filter  applied  in  x/y  order  (e) 
and  y/x  order  (f).  Settings: 
ad  =  2.0,  crT  =  50,  L2  color 
distance. 


17.3  Anisotropic  Diffusion  Filters 

Diffusion  is  a  concept  adopted  from  physics  that  models  the  spatial 
propagation  of  particles  or  state  properties  within  substances.  In  the 
real  world,  certain  physical  properties  (such  as  temperature)  tend  to 
diffuse  homogeneously  through  a  physical  body,  that  is,  equally  in  all 
directions.  The  idea  viewing  image  smoothing  as  a  diffusion  process 
has  a  long  history  in  image  processing  (see,  e.g.,  [11,139]).  To  smooth 
an  image  and,  at  the  same  time,  preserve  edges  or  other  “interesting” 
image  structures,  the  diffusion  process  must  somehow  be  made  locally 
non-homogeneous;  otherwise  the  entire  image  would  come  out  equally 
blurred.  Typically,  the  dominant  smoothing  direction  is  chosen  to  be 
parallel  to  nearby  image  contours,  while  smoothing  is  inhibited  in 
the  perpendicular  direction,  that  is,  across  the  contours. 

Since  the  pioneering  work  by  Perona  and  Malik  [182],  anisotropic 
diffusion  has  seen  continued  interest  in  the  image  processing  com¬ 
munity  and  research  in  this  area  is  still  strong  today.  The  main 
elements  of  their  approach  are  outlined  in  Sec.  17.3.2.  While  various 
other  formulations  have  been  proposed  since,  a  key  contribution  by 
Weickert  [250,251]  and  Tschumperle  [233,236]  unified  them  into  a 
common  framework  and  demonstrated  their  extension  to  color  im¬ 
ages.  They  also  proposed  to  separate  the  actual  smoothing  process 
from  the  smoothing  geometry  in  order  to  obtain  better  control  of  the 
local  smoothing  behavior.  In  Sec.  17.3.4  we  give  a  brief  introduction 
to  the  approach  proposed  by  Tschumperle  and  Deriche,  as  initially 
described  in  [233].  Beyond  these  selected  examples,  a  vast  literature 
exists  on  this  topic,  including  excellent  reviews  [95,250],  textbook 
material  [125,205],  and  journal  articles  (see  [3,45,52,173,206,226], 
for  example). 
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17.3.1  Homogeneous  Diffusion  and  the  Heat  Equation 


Assume  that  in  a  homogeneous,  3D  volume  some  physical  property 
(e.g.,  temperature)  is  specified  by  a  continuous  function  f(x,t)  at 
position  x  =  (x,  y ,  z)  and  time  t.  With  the  system  left  to  itself, 
the  local  differences  in  the  property  /  will  equalize  over  time  until 
a  global  equilibrium  is  reached.  This  diffusion  process  in  3D  space 
(x,  y,  z)  and  time  (£)  can  be  expressed  using  a  partial  differential 
equation  (PDE), 

df  (d2f  82f  d2f\ 

—  -c(v  f)-c-  (—2  +  —  +  —y 

This  is  the  so-called  heat  equation ,  where  V2/  denotes  the  Laplace 
operator 9  applied  to  the  scalar- valued  function  /,  and  c  is  a  constant 
which  describes  the  (thermal)  conductivity  or  conductivity  coefficient 
of  the  material.  Since  the  conductivity  is  independent  of  position  and 
orientation  (c  is  constant),  the  resulting  process  is  isotropic ,  that  is, 
the  heat  spreads  evenly  in  all  directions. 

For  simplicity,  we  assume  c  =  1.  Since  /  is  a  multi-dimensional 
function  in  space  and  time,  we  make  this  fact  a  bit  more  transparent 
by  attaching  explicit  space  and  time  coordinates  x  and  r  to  Eqn. 
(17.40),  that  is, 


d_l 

dt 


(x,t) 


dx2 


O  ,t)  + 


cpl 

dy 2 


o  + 


or,  written  more  compactly, 


=  fxx(x,r)  +  fyy(x,T)  +  fzz(x,r). 


(17.41) 


(17.42) 


Diffusion  in  images 

A  continuous,  time- varying  image  I  may  be  treated  analogously  to 
the  function  /(x,r),  with  the  local  intensities  taking  on  the  role  of 
the  temperature  values  in  Eqn.  (17.42).  In  this  2D  case,  the  isotropic 
diffusion  equation  can  be  written  as9 10 


dl  ~  r  d2I  d2I 
dt  dx 2  dy2 


or 


-^XX  ^”)  T  7yy  (CC,  T)  , 


(17.43) 

(17.44) 


with  the  derivatives  It  =  dl/dt,  /xx  =  d2I / dx2 ,  and  Iyy  =  d2I / dy2 . 
An  approximate,  numerical  solution  of  such  a  PDE  can  be  obtained 
by  replacing  the  derivatives  with  finite  differences. 

Starting  with  the  initial  (typically  noisy)  image  1^  =  /,  the 
solution  to  the  differential  equation  in  Eqn.  (17.44)  can  be  calculated 
iteratively  in  the  form 

9  Remember  that  V/  denotes  the  gradient  of  the  function  /,  which  is  a 
vector  for  any  multi-dimensional  function.  The  Laplace  operator  (or 
Laplacian )  V2/  corresponds  to  the  divergence  of  the  gradient  of  /,  de¬ 
noted  div  V/,  which  is  a  scalar  value  (see  Secs.  C.2.5  and  C.2.4  in  the 
Appendix).  Other  notations  for  the  Laplacian  are  V*(V/),  (V-V)/, 
V-V/,  V2/,  or  A/. 

10  Function  arguments  (£,t)  are  omitted  here  for  better  readability. 
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n  —  0  n  —  5  n  =  10  n  —  20  n  —  40  n  —  80 


/T 


1.411 


1.996 


2.823 


3.992 


5.646 


(a)  (b) 


(c)  (d) 


(e) 


(f) 


17.3  Anisotropic 
Diffusion  Filters 


Fig.  17.15 

Discrete  isotropic  diffusion. 
Blurred  images  and  impulse 
response  obtained  after  n  it¬ 
erations,  with  a  =  0.20  (see 
Eqn.  (17.45)).  The  size  of  the 
images  is  50  X  50.  The  width 
of  the  equivalent  Gaussian  ker¬ 
nel  (crn)  grows  with  the  square 
root  of  n  (the  number  of  itera¬ 
tions).  Impulse  response  plots 
are  normalized  to  identical 
peak  values. 


/(n)(w) 


C- 


jl(u) 

p(n-!)(M)  +  a. 

V2/("-i)(iff 

for  n  =  0, 
for  n  >  0, 


(17.45) 


for  each  image  position  u  =  (u,v),  with  n  denoting  the  iteration 
number.  This  is  called  the  “direct”  solution  method  (there  are  other 
methods  but  this  is  the  simplest).  The  constant  a  in  Eqn.  (17.45)  is 
the  time  increment,  which  controls  the  speed  of  the  diffusion  process. 
Its  value  should  be  in  the  range  (0,  0.25]  for  the  numerical  scheme 
to  be  stable.  At  each  iteration  n,  the  variations  in  the  image  func¬ 
tion  are  reduced  and  (depending  on  the  boundary  conditions)  the 
image  function  should  eventually  flatten  out  to  a  constant  plane  as 
n  approaches  infinity. 

For  a  discrete  image  /,  the  Laplacian  V2/  in  Eqn.  (17.45)  can  be 
approximated  by  a  linear  2D  filter, 


with  Hl 


0  1  0 
1  -4  1 

0  1  0 


(17.46) 


as  described  earlier.11  An  essential  property  of  isotropic  diffusion  is 
that  it  has  the  same  effect  as  a  Gaussian  filter  whose  width  grows 
with  the  elapsed  time.  For  a  discrete  2D  image,  in  particular,  the 
result  obtained  after  n  diffusion  steps  (Eqn.  (17.45)),  is  the  same  as 
applying  a  linear  filter  to  the  original  image  /, 

I(n)  =  I*HG’a™,  (17.47) 

with  the  normalized  Gaussian  kernel 

i  _  x2 +y2 

HG,(7ri(x1y)  = - —  •  e  2<Tri  (17.48) 

27T(Tn 

of  width  (7n  =  V2t  =  a/2 n/a.  The  example  in  Fig.  17.15  illustrates 
this  Gaussian  smoothing  behavior  obtained  with  discrete  isotropic 
diffusion. 


li 


See  also  Chapter  6,  Sec.  6.6.1  and  Sec.  C.3.1  in  the  Appendix. 
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17.3.2  Perona-Malik  Filter 


Isotropic  diffusion,  as  we  have  described,  is  a  homogeneous  opera¬ 
tion  that  is  independent  of  the  underlying  image  content.  Like  any 
Gaussian  filter,  it  effectively  suppresses  image  noise  but  also  tends 
to  blur  away  sharp  boundaries  and  detailed  structures,  a  property 
that  is  often  undesirable.  The  idea  proposed  in  [182]  is  to  make  the 
conductivity  coefficient  variable  and  dependent  on  the  local  image 
structure.  This  is  done  by  replacing  the  conductivity  constant  c  in 
Eqn.  (17.40),  which  can  be  written  as 

r)T 

^(x,t)=c-[V2I](x,t),  (17.49) 

by  a  function  c(x,t)  that  varies  over  space  x  and  time  £,  that  is, 

dl 

—  (x,t)  =  c(x,t)  •  [V2I](x,t).  (17.50) 


If  the  conductivity  function  c()  is  constant,  then  the  equation  reduces 
to  the  isotropic  diffusion  model  in  Eqn.  (17.44). 

Different  behaviors  can  be  implemented  by  selecting  a  particular 
function  c().  To  achieve  edge-preserving  smoothing,  the  conductivity 
c()  is  chosen  as  a  function  of  the  magnitude  of  the  local  gradient 
vector  V/,  that  is, 


c(x,t)  :=  g(d)  =  ff(||[V/(r)](a;)||) 


(17.51) 


To  preserve  edges,  the  function  g{d )  :  R  — >  [0, 1]  should  return  high 
values  in  areas  of  low  image  gradient,  enabling  smoothing  of  homo¬ 
geneous  regions,  but  return  low  values  (and  thus  inhibit  smoothing) 
where  the  local  brightness  changes  rapidly.  Commonly  used  conduc¬ 
tivity  functions  g(d)  are,  for  example  [48,182], 

9i(d)  =  e~[d/K)  ,  g2(d)  =  j  .  ,2  ,  (17.52) 

1  \{l-(d/2n)2)2  for  d  <  2k, 

dM  =  yfTWT  ’  ^  =  \0  otherwise, 


where  n  >  0  is  a  constant  that  is  either  set  manually  (typically  in 
the  range  [5,  50]  for  8-bit  images)  or  adjusted  to  the  amount  of  image 
noise.  Graphs  of  the  four  functions  in  Eqn.  (17.52)  are  shown  in  Fig. 
17.16  for  selected  values  of  n.  The  Gaussian  conductivity  function 
g1  tends  to  promote  high-contrast  edges,  whereas  g2  and  even  more 
g3  prefer  wide,  flat  regions  over  smaller  ones.  Function  g4,  which 
corresponds  to  Tuckey’s  biweight  function  known  from  robust  statis¬ 
tics  [205,  p.  230],  is  strictly  zero  for  any  argument  d  >  2n.  The  exact 
shape  of  the  function  g()  does  not  appear  to  be  critical;  other  func¬ 
tions  with  similar  properties  (e.g.,  with  a  linear  cutoff)  are  sometimes 
used  instead. 

As  an  approximate  discretization  of  Eqn.  (17.50),  Perona  and  Ma¬ 
lik  [182]  proposed  the  simple  iterative  scheme 


(17.53) 


:0 
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Fig.  17.16 

Typical  conductivity  func¬ 
tions  9l{),  •  •  •  ,  9a  ()  for 
k  =  4,10,20,30,40  (see 
Eqn.  (17.52)).  If  the  magni¬ 
tude  of  the  local  gradient  d  is 
small  (near  zero),  smoothing 
amounts  to  a  maximum  (1.0), 
whereas  diffusion  is  reduced 
where  the  gradient  is  high, 
for  example,  at  or  near  edges. 
Smaller  values  of  k  result  in 
narrower  curves,  thereby  re¬ 
stricting  the  smoothing  opera¬ 
tion  to  image  areas  with  only 
small  variations. 


v  —  1 


V 


v-\-l 


u—  1  u  u+1 


Fig.  17.17 

Discrete  lattice  used  for  im¬ 
plementing  diffusion  filters  in 
the  Perona-Malik  algorithm. 
The  green  element  represents 
the  current  image  pixel  at 
position  u  =  (u,  v)  and  the 
yellow  elements  are  its  direct 
4-neighbors. 


where  1^  —  I  is  the  original  image  and 

6,(1,  u )  =  I(u  +  di)  -  I(u)  (17.54) 

denotes  the  difference  between  the  pixel  value  I(u)  and  its  direct 
neighbor  i  =  0, . . . ,  3  (see  Fig.  17.17),  with 

do  =  (S),  <*!  =  (?),  d2  =  -(l),  d3  =  -(?).  (17.55) 

The  procedure  for  computing  the  Perona-Malik  filter  for  scalar- valued 
images  is  summarized  in  Alg.  17.7.  The  examples  in  Fig.  17.18 
demonstrate  how  this  filter  performs  along  a  step  edge  in  a  noisy 
grayscale  image  compared  to  isotropic  (i.e.,  Gaussian)  filtering. 

In  summary,  the  principle  operation  of  this  filter  is  to  inhibit 
smoothing  in  the  direction  of  strong  local  gradient  vectors.  Wher¬ 
ever  the  local  contrast  (and  thus  the  gradient)  is  small,  diffusion  oc¬ 
curs  uniformly  in  all  directions,  effectively  implementing  a  Gaussian 
smoothing  filter.  However,  in  locations  of  high  gradients,  smooth¬ 
ing  is  inhibited  along  the  gradient  direction  and  allowed  only  in  the 
direction  perpendicular  to  it.  If  viewed  as  a  heat  diffusion  process, 
a  high-gradient  brightness  edge  in  an  image  acts  like  an  insulating 
layer  between  areas  of  different  temperatures.  While  temperatures 
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Alg.  17.7 

Perona-Malik  anisotropic  diffu¬ 
sion  filter  for  scalar  (grayscale) 
images.  The  input  image  I 
is  assumed  to  be  real-valued 
(floating-point).  Temporary 
real- valued  maps  D^.,  Dy  are 
used  to  hold  the  directional 
gradient  values,  which  are  then 
re-calculated  in  every  iteration. 

The  conductivity  function 
g{d)  can  be  one  of  the  func¬ 
tions  defined  in  Eqn.  (17.52), 
or  any  similar  function. 


1:  PeronaMalikGray(7,  a,  r,  T) 

Input:  /,  a  grayscale  image  of  size  M  x  TV;  a ,  update  rate;  r, 
smoothness  parameter;  T,  number  of  iterations.  Returns  the 
modified  image  /. 


Specify  the  conductivity  function: 

2:  g(d)  :=  e  ^ d ^  >  for  example,  see  alternatives  in  Eq.  17.52 

3:  (M,  N)  <-  Size(J) 

4:  Create  maps  D^,  D^:  MxA^-^R 

5:  for  n  V-  1, . . . ,  T  do  >  perform  T  iterations 

6:  for  all  coordinates  (r,  v)  £  M  x  N  do  >  re-calculate 

gradients 


7: 


8: 


Dx(u,  v )  <— 


D  y(u,  V )  <~ 


I(u+ 1,  v)  —  7(r,  v) 
0 

7(r,  r  +  1)  —  7(r,  v) 
0 


9: 

for  all  coordinates  (r,  v) 

£  MxN  do 

10: 

So  Dx(r,  v) 

11: 

N  ( R,  V ) 

12: 

if  r  >  0 

otherwise 

13: 

$3«_  {-'V"’1'-1) 

if  v  >  0 
otherwise 

14: 

I (r,  v)  <—  I (r,  v)  +  a 

Es(ivi)- 

k  —  0 

15: 

return  I 

if  r  <  M  — 1 
otherwise 
if  r  <  N—l 
otherwise 

>  update  the  image 
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continuously  level  out  in  the  homogeneous  regions  on  either  side  of 
an  edge,  thermal  energy  does  not  diffuse  across  the  edge  itself. 

Note  that  the  Perona-Malik  filter  (as  defined  in  Eqn.  (17.50))  is 
formally  considered  a  nonlinear  filter  but  not  an  anisotropic  diffu¬ 
sion  filter  because  the  conductivity  function  g()  is  only  a  scalar  and 
not  a  (directed)  vector- valued  function  [250].  However,  the  (inex¬ 
act)  discretization  used  in  Eqn.  (17.53),  where  each  lattice  direction 
is  attenuated  individually,  makes  the  filter  appear  to  perform  in  an 
anisotropic  fashion. 

IT. 3. 3  Perona-Malik  Filter  for  Color  Images 

The  original  Perona-Malik  filter  is  not  explicitly  designed  for  color 
images  or  vector- valued  images  in  general.  The  simplest  way  to  apply 
this  filter  to  a  color  image  is  (as  usual)  to  treat  the  color  channels  as 
a  set  of  independent  scalar  images  and  filter  them  separately.  Edges 
should  be  preserved,  since  they  occur  only  where  at  least  one  of  the 
color  channels  exhibits  a  strong  variation.  However,  different  filters 
are  applied  to  the  color  channels  and  thus  new  chromaticities  may  be 
produced  that  were  not  contained  in  the  original  image.  Neverthe¬ 
less,  the  results  obtained  (see  the  examples  in  Fig.  17.19(b-d))  are 
often  satisfactory  and  the  approach  is  frequently  used  because  of  its 
simplicity. 


n  —  0 


n  —  2 


n  —  5 


n  =  10 
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Fig.  17.18 

Isotropic  vs.  anisotropic  dif¬ 
fusion  applied  to  a  noisy  step 
edge.  Original  image,  enlarged 
detail,  and  horizontal  pro¬ 
file  (a),  results  of  isotropic 
diffusion  (b— d),  results  of 
anisotropic  diffusion  (e— g) 
after  n  =  2,  5,  10  iterations,  re¬ 
spectively  (cm  =  0.20,  k  =  40). 


Color  diffusion  based  on  the  brightness  gradient 

As  an  alternative  to  filtering  each  color  channel  separately,  it  has  been 
proposed  to  use  only  the  brightness  (intensity)  component  to  control 
the  diffusion  process  of  all  color  channels.  Given  an  RGB  color  image 
I  =  (/R,/G,/B)  and  a  brightness  function  /3(J),  the  iterative  scheme 
in  Eqn.  (17.53)  could  be  modified  to 


i=0 


where  /^(/,  u)  =  /3(I(u  +  d^)  —  /3(I(u)), 


u 


)D 


S,(I 


(n— 1) 


\u) 


1 


(17.56) 

(17.57) 


di  is  the  local  brightness  difference  (as  defined  in  Eqn.  (17.55))  and 


&i(I,u) 


Ir(u  +  —  IR(u) 

Iq(u  +  —  IG(u) 

Ir(u  +  d^  —  Ib{u) 


^ i  (-^R  7  ^) 

SiVc,*') 

$i(I Biu) 


(17.58) 
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Fig.  17.19 

Anisotropic  diffusion  filter 
(color).  Noisy  test  image  (a). 
Anisotropic  diffusion  filter  ap¬ 
plied  separately  to  individual 
color  channels  (b— d),  diffusion 
controlled  by  brightness  gradi¬ 
ent  (e— g),  diffusion  controlled 
by  color  gradient  (h— j),  after 
2,  5,  and  10  iterations,  respec¬ 
tively  (ck  =  0.20,  k,  =  40). 
With  diffusion  controlled  by 
the  brightness  gradient,  strong 
blurring  occurs  between  re¬ 
gions  of  different  color  but 
similar  brightness  (e— g).  The 
most  consistent  results  are 
obtained  by  diffusion  con¬ 
trolled  by  the  color  gradient 
(h— j).  Filtering  was  performed 
in  linear  RGB  color  space. 


is  the  local  color  difference  vector  for  the  neighboring  pixels  in  direc¬ 
tions  i  =  0, . . . ,  3  (see  Fig.  17.17).  Typical  choices  for  the  brightness 
function  /?()  are  the  luminance  Y  (calculated  as  a  weighted  sum  of  the 
linear  R ,  G*,  B  components),  luma  Y'  (from  nonlinear  R' ,Gr ,  B'  com¬ 
ponents),  or  the  lightness  component  L  of  the  CIELAB  and  CIELUV 
color  spaces  (see  Chapter  15,  Sec.  15.1  for  a  detailed  discussion). 

Algorithm  17.7  can  be  easily  adapted  to  implement  this  type  of 
color  filter.  An  obvious  disadvantage  of  this  method  is  that  it  natu¬ 
rally  blurs  across  color  edges  if  the  neighboring  colors  are  of  similar 
brightness,  as  the  examples  in  Fig.  17.19(e-g))  demonstrate.  This 
limits  its  usefulness  for  practical  applications. 


Using  the  color  gradient 

A  better  option  for  controlling  the  diffusion  process  in  all  three  color 
channels  is  to  use  the  color  gradient  (see  Ch.  16,  Sec.  16.2.1).  As 
defined  in  Eqn.  (16.17),  the  color  gradient 


(gradgi  I)(u)  =  Ix(u )  •  cos(0)  +  Iy(u )  •  sin(0) 


(17.59) 


is  a  3D  vector,  representing  the  combined  variations  of  the  color 
image  I  at  position  u  in  a  given  direction  6.  The  squared  norm 
of  this  vector,  Sg(I,u)  =  ||  (gradg,  I)(u) ||  ,  called  the  squared  local 
contrast ,  is  a  scalar  quantity  useful  for  color  edge  detection.  Along 
the  horizontal  and  vertical  directions  of  the  discrete  diffusion  lattice 
(see  Fig.  17.17),  the  angle  6  is  a  multiple  of  tt/2,  and  thus  one  of  the 
cosine/sine  terms  in  Eqn.  (17.59)  vanishes,  that  is, 
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(grad0  I)(u) 


I  (grad in/2  I)(u)  | 

( ll-C  («)||  fori  =  0,2, 
1 1| Iy(u)\\  for  i  —  1,3. 


Taking  Si  (Eqn.  (17.58))  as  an  estimate  for  the  horizontal  and  vertical 
derivatives  Jx,  Iyl  the  diffusion  iteration  (adapted  from  Eqn.  (17.53)) 
thus  becomes 


3 

I{n\u)  «-  /(n-1}(M)  +a^2g( 

i= 0 


oid-^r^w), 

(17.61) 


with  g()  chosen  from  one  of  the  conductivity  functions  in  Eqn. 

(17.52) .  Note  that  this  is  almost  identical  to  the  formulation  in  Eqn. 

(17.53) ,  except  for  the  use  of  vector- valued  images  and  the  absolute 
values  |  •  |  being  replaced  by  the  vector  norm  ||  •  ||.  The  diffusion 
process  is  coupled  between  color  channels,  because  the  local  diffusion 
strength  depends  on  the  combined  color  difference  vectors.  Thus, 
unlike  in  the  brightness-governed  diffusion  scheme  in  Eqn.  (17.56), 
opposing  variations  in  different  color  do  not  cancel  out  and  edges 
between  colors  of  similar  brightness  are  preserved  (see  the  examples 
in  Fig.  17.19(h-j)). 

The  resulting  process  is  summarized  in  Alg.  17.8.  The  algorithm 
assumes  that  the  components  of  the  color  image  I  are  real- valued.  In 
practice,  integer- valued  images  must  be  converted  to  floating  point 
before  this  procedure  can  be  applied  and  integer  results  should  be 
recovered  by  appropriate  rounding. 


Examples 

Figure  17.20  shows  the  results  of  applying  the  Perona-Malik  filter 
to  a  color  image,  using  different  modalities  to  control  the  diffusion 
process.  In  Fig.  17.20(a)  the  scalar  (grayscale)  diffusion  filter  (de¬ 
scribed  in  Alg.  17.7)  is  applied  separately  to  each  color  channel.  In 
Fig.  17.20(b)  the  diffusion  process  is  coupled  over  all  three  color  chan¬ 
nels  and  controlled  by  the  brightness  gradient ,  as  specified  in  Eqn. 
(17.56).  Finally,  in  Fig.  17.20(c)  the  color  gradient  is  used  to  control 
the  common  diffusion  process,  as  defined  in  Eqn.  (17.61)  and  Alg. 
17.8.  In  each  case,  T  =  10  diffusion  iterations  were  applied,  with 
update  rate  a  =  0.20,  smoothness  n  =  25,  and  conductivity  function 
gi(d).  The  example  demonstrates  that,  under  otherwise  equal  condi¬ 
tions,  edges  and  line  structures  are  best  preserved  by  the  filter  if  the 
diffusion  process  is  controlled  by  the  color  gradient. 


17.3.4  Geometry  Preserving  Anisotropic  Diffusion 

Historically,  the  seminal  publication  by  Perona  and  Malik  [182]  was 
followed  by  increased  interest  in  the  use  of  diffusion  filters  based 
on  partial  differential  equations.  Numerous  different  schemes  were 
proposed,  mainly  with  the  aim  to  better  adapt  the  diffusion  process 
to  the  underlying  image  geometry. 
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Alg.  17.8 

Anistropic  diffusion  filter  for 
color  images  based  on  the 
color  gradient  (see  Ch.  16, 
Sec.  16.2.1).  The  conduc¬ 
tivity  function  g(d)  may  be 
chosen  from  the  functions 
defined  in  Eqn.  (17.52),  or 
any  similar  function.  Note 
that  (unlike  in  Alg.  17.7)  the 
maps  Djj.,  Dy  are  vector- valued. 


1:  PeronaMalikColor(  J,  a,  r,  T) 

Input:  /,  an  RGB  color  image  of  size  M  x  iV;  a ,  update  rate; 
r,  smoothness  parameter;  T,  number  of  iterations.  Returns  the 
modified  image  I. 


2 

3 

4 

5 

6 

7: 

8: 

9: 

10: 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 

22: 

23: 


Specify  the  conductivity  function: 

g(d)  :=  e  >  for  example,  see  alternatives  in  Eq.  17.52 

(M,  N)  <r-  Size(J) 

Create  maps  Dx,Dy:  MxA^-^13;  Sx ,  Sy  :  M  x  N  -A  R 

for  n  V-  1, . . . ,  T  do  >  perform  T  iterations 

for  all  (u,v)  £  M  x  N  do  >  re-calculate  gradients 

/(r+1,  v)  —  I(u,  v)  if  r  <  M  —  1 
0  otherwise 


v)  V- 


D y(u,  V )  V" 


J(r,  r  +  1)  —  J(r,  v)  ifv<N  —  l 


0 


otherwise 


Sx{u,v)  <-  (Da,(M,t)))2 
Sy(u,v)  «-  (Dy(u,  v))2 

for  all  (r,r)  G  MxTV  do 

so  G-  Sa-C*/,  r),  G- 

51  <—  Sy(u,v),  Ax  <—  D^(r,r) 

52  V-  0,  4^2  G-  0 

s3  <—  0,  A3  4—  0 

if  u  >  0  then 

s2  <-  Sx(r—  l,  v) 

A2  <  Dx(r—  l,  v) 
if  v  >  0  then 

S q  i —  S-j,  (r,  V  —  l) 

A3  < - Dy(u,v-1) 

3 

I{u,v)  <-  I(u,v )  +  a  ■  X)  g(\sk\) 

k= 0 

return  I 


>  =1 

-1-  Kj')'Ay 

>  =  /».,+  I%  ni  +  I B,y 


R,x  +  ^G,cc  +  Ib,x 

2 

<3,1/ 


>  update  the  image 


Generalized  divergence-based  formulation 

Weickert  [249,  250]  generalized  the  divergence-based  formulation  of 
the  Perona-Malik  approach  (see  Eqn.  (17.49)),  that  is, 

f)  T 

—  =di  v(c-Vl), 

by  replacing  the  time-varying,  scalar  diffusivity  field  c(x,r)  G  R  by 
a  diffusion  tensor  field  D(x,r)  G  M2x2  in  the  form 

dl 

—  =  div(£>- V/).  (17.62) 

The  time- varying  tensor  field  D(x,r)  specifies  a  symmetric,  positive- 
definite  2x2  matrix  for  each  2D  image  position  x  and  time  r  (i.e.,  D  : 
M3  -A  M2x2  in  the  continuous  case).  Geometrically,  D  specifies  an 
oriented,  stretched  ellipse  which  controls  the  local  diffusion  process. 
D  may  be  independent  of  the  image  I  but  is  typically  derived  from 
it.  For  example,  the  original  Perona-Malik  diffusion  equation  could 
be  (trivially)  written  in  the  form12 
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12 


I2  denotes  the  2x2  identity  matrix. 


(a)  Color  channels  filtered  separately 


(b)  Diffusion  controlled  by  the  local  brightness  gradient 


(c)  Diffusion  controlled  by  the  local  color  gradient 


17.3  Anisotropic 
Diffusion  Filters 


Fig.  17.20 

Perona-Malik  color  example. 
Scalar  diffusion  filter  applied 
separately  to  each  color  chan¬ 
nel  (a);  diffusion  controlled  by 
the  brightness  gradient  (b); 
diffusion  controlled  by  color 
gradient  (c).  Common  set¬ 
tings  are  T  =  10,  a  =  0.20, 
g{d)  =  g1(d),  k  =  25;  original 
image  in  Fig.  17.3(a). 


dl 

~di 


div 


c  0\ 
0  c) 


(17.63) 


where  c  =  g  (\\\7I(x,t)\\)  (see  Eqn.  (17.51)),  and  thus  D  is  coupled 
to  the  image  content.  In  Weickert’s  approach,  D  is  constructed  from 
the  eigenvalues  of  the  local  “image  structure  tensor”  [251],  which 
we  have  encountered  under  different  names  in  several  places.  This 
approach  was  also  adapted  to  work  with  color  images  [252]. 


Trace-based  formulation 

Similar  to  the  work  of  Weickert,  the  approach  proposed  by  Tschumperle 
and  Deriche  [233,  235]  also  pursues  a  geometry-oriented  generaliza¬ 
tion  of  anisotropic  diffusion.  The  approach  is  directly  aimed  at 
vector- valued  (color)  images,  but  can  also  be  applied  to  single-channel 
(scalar- valued)  images.  For  a  vector- valued  image  I  =  (/]_, . . . ,  7n), 
the  smoothing  process  is  specified  as 


dh 

dt 


trace  ( A  •  Hfc) , 


(17.64) 


for  each  channel  fc,  where  denotes  the  Hessian  matrix  of  the 
scalar- valued  image  function  of  channel  Ikl  and  A  is  a  square  (2x2 
for  2D  images)  matrix  that  depends  on  the  complete  image  I  and 
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adapts  the  smoothing  process  to  the  local  image  geometry.  Note 
that  A  is  the  same  for  all  image  channels.  Since  the  trace  of  the 
Hessian  matrix13  is  the  Laplacian  of  the  corresponding  function  (i.e., 
trace (Hj)  =  V2/)  the  diffusion  equation  for  the  Perona-Malik  filter 
(Eqn.  (17.49))  can  be  written  as 


dl 

9t 


C'{X/2I)  =  div(c-VJ) 

trace  ((c  •  I2)  •  H j)  =  trace  (c*Hj) . 


(17.65) 


In  this  case,  A  =  c*I2,  which  merely  applies  the  constant  scalar  factor 
c  to  the  Hessian  matrix  H j  (and  thus  to  the  resulting  Laplacian)  that 
is  derived  from  the  local  image  (since  c  =  g  (||  V/(cc,  t)  ||))  and  does 
not  represent  any  geometric  information. 


17.3.5  Tschumperle-Deriche  Algorithm 

This  is  different  in  the  trace-based  approach  proposed  by  Tschumperle 
and  Deriche  [233,235],  where  the  matrix  A  in  Eqn.  (17.64)  is  com¬ 
posed  by  the  expression 

A  =  f1(X1,X2)-(q2-qJ2)  +  /2(Ai>  A2)  •  Oi  •  9i),  (17.66) 

where  A1?  A2  and  qq,  q2  are  the  eigenvalues  and  normalized  eigenvec¬ 
tors,  respectively,  of  the  (smoothed)  2x2  structure  matrix 

K 

G?  =  X)(V  4)-(V4)T,  (17.67) 

k=l 

with  Vi*.  denoting  the  local  gradient  vector  in  image  channel  Ik. 
The  functions  /i(),/2(),  which  are  defined  in  Eqn.  (17.79),  use  the 
two  eigenvalues  to  control  the  diffusion  strength  along  the  dominant 
direction  of  the  contours  (/x)  and  perpendicular  to  it  (/2).  Since 
the  resulting  algorithm  is  more  involved  than  most  previous  ones,  we 
describe  it  in  more  detail  than  usual. 

Given  a  vector- valued  image  I :  MxN  Mn,  the  following  steps 
are  performed  in  each  iteration  of  the  algorithm: 


Step  1: 

Calculate  the  gradient  at  each  image  position  u  =  (r,u), 


for  each  color  channel  k  =  1, . . .  ,iL.14  The  first  derivatives  of  the 
gradient  vector  VIk  are  estimated  by  convolving  the  image  with  the 
kernels 


IQ 

See  Sec.  C.2.6  in  the  Appendix  for  details. 

14  Note  that  X7Ik(u)  in  Eqn.  (17.68)  is  a  2D,  vector- valued  function,  that  is, 
a  dedicated  vector  is  calculated  for  every  image  position  u  =  (u,  v).  For 
better  readability,  we  omit  the  spatial  coordinate  ( u )  in  the  following 
and  simply  write  VI*,  instead  of  X7Ik(u).  Analogously,  all  related  vectors 
and  matrices  defined  below  (including  the  vectors  el5  e2  and  the  matrices 
G,  G,  A,  and  Hfe)  are  also  calculated  for  each  image  point  u,  without 
the  spatial  coordinate  being  explicitly  given. 
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—a 

0 

a 

a 

b 

a 

with  a  =  (2  —  v/2)/4  and  b  =  (>/2  —  l)/2  (such  that  2a  +  b  =  1/2). 15 


Step  2: 

Smooth  the  channel  gradients  Ik  x,  Ik  y  with  an  isotropic  2D  Gaussian 
filter  kernel  i7G,crd  of  radius  crd, 


V/ 


k  — 


4,x 

4,y 


(17.70) 


for  each  image  channel  k  =  1, . . . ,  77.  In  practice,  this  step  is  usually- 
skipped  by  setting  crd  =  0. 


Step  3: 

Calculate  the  Hessian  matrix  (see  Sec.  C.2.6  in  the  Appendix)  for 
each  image  channel  7fe,  k  =  1, . . . ,  77,  that  is, 


H 


k 


(d2Ik  d2Ik  \  /T  T 

dx2  dxdy  \  I  k,xx  -*fc,xy 

d2Ik  d2Ik  I 

dydx  dy 2  / 


4,xy  Ik, yyt 


4*i7xvx7,*i7xvy 
7,*77xvy  7,*77Z 


,  (17.71) 


yy. 


using  the  filter  kernels 


i7xvx=[l-2  1],  i7yvy  = 


1 


1 


/7V  -  1 

^xy-  4 


1  0  -1 

0  0  0 

1  0  1 


(17.72) 


Step  4: 

Calculate  the  local  variation  (structure)  matrix  as 

K 


G  = 


G0  G1 
Gi  G 


fc)  •  (V4)t 


fc=l 


E 


72 

1k,x 


k=l  \4,x*4,y 


4,x'4,y 

72 

ife,y 


/  K  _  K 

‘  E42x  E4,x-4, 

A:=l  fc=l 

K  _  _  K 

E  4,x-4,y  E42 


(17.73) 

\ 


y 


y  ^  "4,y  / 

fc=l  fe=l  / 


for  each  image  position  u.  Note  that  the  matrix  G  is  symmetric  (and 
positive  semidefinite).  In  particular,  for  a  RGB  color  image  this  is 
(coordinates  u  again  omitted) 


G  = 


T2  T  T 

2R,x  2R,x^R  ,y 

lR,x^R,y  Ir  ,y 


.  ^G,x  I-G,xlG,y 

Ig,x^G  ,y  I G,y 


.  I B,x  G,xG,y 

^B,x^B,y  4i,.r 


_  I  IR,x  +  IG,x  +  IB,x  RRk^  +  IgRg^  +  IbRb^v 

XGxIr^  +  IgRg^  +  Ib,'. Rb,v  IR,y  +  IG,y  +  IB,y 

_  (17.74) 

15  Any  other  common  set  of  x/y  gradient  kernels  (e.g.,  Sobel  masks)  could 
be  used  instead,  but  these  filters  have  better  rotation  invariance  than 
their  traditional  counterparts.  Similar  kernels  (with  a  =  3/32,  b  = 
10/32)  were  proposed  by  Jahne  in  [126,  p.  353]. 
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Step  5: 

Smooth  the  elements  of  the  structure  matrix  G  using  an  isotropic 
Gaussian  filter  kernel  HG,a&  of  radius  <rg,  that  is, 


{Gq  GO  _  (G0*HG'az  G1*HG>or«\ 

VGi  °2 )  ~  Gi^G’ffg  g2*hg '°k y  ■ 


(17.75) 


Step  6: 

For  each  image  position  u,  calculate  the  eigenvalues  Al7A2  for  the 
smoothed  2x2  matrix  G,  such  that  Ax  >  A2,  and  the  corresponding 
normalized  eigenvectors16 


<h  = 


0 2  = 


5 


such  that  ||g1||  =  ||q2||  =  F  Note  that  qx  points  in  the  direction  of 
maximum  change  and  q2  points  in  the  perpendicular  direction,  that 
is,  along  the  edge  tangent.  Thus,  smoothing  should  occur  predomi¬ 
nantly  along  q2.  Since  qx  and  q2  are  normal  to  each  other,  we  can 
express  q2  in  terms  of  gr1?  for  example, 


0.2 


•  Oi  = 


(17.76) 


Step  7: 

From  the  eigenvalues  (A1?  A2)  and  the  normalized  eigenvectors  (g1?  q2) 
of  G,  compose  the  symmetric  matrix  A  in  the  form 


A 


(Ao  A±\ 

Ui  AJ 


A(Ai,  A2)  -(q2  ‘  O2) 

v - v - ' 

Cl 


+ 


c2 


=  Cl 


A  O 

y\ 

/A  /A. 

■^1  •  Vi 


-%i  *  Vi 

A  9 
tAy  "1 


+  C2 


V  /A.  ^9 

Zl  •  1/1  2/1 


piTi+c2-^f  (c2-ci)-L-yO 

\(c2— ci)  '  "9i  c1-xl  +  c2-yf  J’ 


(17.77) 

(17.78) 


using  the  conductivity  coefficients 

1 

ci=/i(A1,A2)=  (1  +  Ai+Aa)0i 

1 

c2  =  A(Ai,  A2)  =  : — “  •  ■  - 

(1  +  Ai  +  A2)a2 


(17.79) 


with  fixed  parameters  alla2  >  0  to  control  the  non-isotropy  of  the 
filter:  a1  specifies  the  amount  of  smoothing  along  contours,  a2  in 
perpendicular  direction  (along  the  gradient).  Small  values  of  alla2 
facilitate  diffusion  in  the  corresponding  direction,  while  larger  values 
inhibit  smoothing.  With  a1  close  to  zero,  diffusion  is  practically 
unconstrained  along  the  tangent  direction.  Typical  default  values 
are  a1  =  0.5  and  a2  =  0.9;  results  from  other  settings  are  shown  in 
the  examples. 
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See  Sec.  B.4.1  in  the  Appendix  for  details  on  calculating  the  eigensystem 
of  a  2  x  2  matrix. 


Step  8: 

Finally,  each  image  channel  Ik  is  updated  using  the  recurrence  rela¬ 
tion 


Ik  i —  Ik  -f~  ex  •  trace  (A'H^,)  —  Ik  -f-  ex  •  (3k  (IT. 80) 

=  Ik  +  ex  •  (tLq-J^xx  +  A1-Ik^y  +  A1-Ikyx  +  A2-Ik,yy)  (17.81) 
=  Ik  +  ex  •  (A0-7fcjXX  +  2  •  A1-Ik  x y  +  A2-Ik yy)  (17.82) 

v - V - ' 

Pk 


(since  7fe  x y  =  7^  yx)*  The  term  f3k  =  trace  (A*Hfc)  represents  the  lo¬ 
cal  image  velocity  in  channel  fc.  Note  that,  although  a  separate  Hes¬ 
sian  matrix  Hk  is  calculated  for  each  channel,  the  structure  matrix  A 
is  the  same  for  all  image  channels.  The  image  is  thus  smoothed  along 
a  common  image  geometry  which  considers  the  correlation  between 
color  channels,  since  A  is  derived  from  the  joint  structure  matrix  G 
(Eqn.  (17.74))  and  therefore  combines  all  K  color  channels. 

In  each  iteration,  the  factor  a  in  Eqn.  (17.82)  is  adjusted  dynam¬ 
ically  to  the  maximum  current  velocity  f3k  in  all  channels  in  the  form 


dt 

max  j3k 


_ dt _ 

max  |  trace  {A -Hfe)|  ’ 

k ,  u 


(17.83) 


where  dt  is  the  (constant)  “time  increment”  parameter.  Thus  the 
time  step  a  is  kept  small  as  long  as  the  image  gradients  (vector 
field  velocities)  are  large.  As  smoothing  proceeds,  image  gradients 
are  reduced  and  thus  a  typically  increases  over  time.  In  the  actual 
implementation,  the  values  of  Ik  (in  Eqn.  (17.82))  are  hard-limited 
to  the  initial  minimum  and  maximum. 

The  steps  (1-8)  we  have  just  outlined  are  repeated  for  the  speci¬ 
fied  number  of  iterations.  The  complete  procedure  is  summarized  in 
Alg.  17.9  and  a  corresponding  Java  implementation  can  be  found  on 
the  book’s  website  (see  Sec.  17.4). 

Beyond  this  baseline  algorithm,  several  variations  and  extensions 
of  this  filter  exist,  including  the  use  of  spatially-adaptive,  oriented 
smoothing  filters.17  This  type  of  filter  has  also  been  used  with  good 
results  for  image  inpainting  [234],  where  diffusion  is  applied  to  fill 
out  only  selected  (masked)  parts  of  the  image  where  the  content  is 
unknown  or  should  be  removed. 


Examples 

The  example  in  Fig.  17.21  demonstrates  the  influence  of  image  ge¬ 
ometry  and  how  the  non-isotropy  of  the  Tschumperle-Deriche  filter 
can  be  controlled  by  varying  the  diffusion  parameters  a1?  a2  (see  Eqn. 
(17.79)).  Parameter  a1?  which  specifies  the  diffusion  in  the  direction 
of  contours,  is  changed  while  a2  (controlling  the  diffusion  in  the  gra¬ 
dient  direction)  is  held  constant.  In  Fig.  17.21(a),  smoothing  along 
contours  is  modest  and  very  small  across  edges  with  the  default  set¬ 
tings  a1  =  0.5  and  a2  =  0.9.  With  lower  values  of  a1?  increased 
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17 


A  recent  version  was  released  by  the  original  authors  as  part  of  the 
“GREYC’s  Magic  Image  Converter”  open-source  framework,  which  is 
also  available  as  a  GIMP  plugin  (http://gmic.sourceforge.net). 
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Alg.  17.9 

Tschumperle-Deriche 
anisotropic  diffusion  filter 
for  vector-valued  (color)  im¬ 
ages.  Typical  settings  are 
T  =  5,  .  .  .  ,  20,  dt  =  20, 
(jg  =  0,  crs  =  0.5,  a1  =  0.5, 
a2  =  0.9.  See  Sec.  B.4.1 
for  a  description  of  the  pro¬ 
cedure  RealEigenValues2x2 
(used  in  line  12). 


1: 


2: 


3: 

4: 

5: 


6: 

7: 

8: 

9: 

10: 

11: 

12: 

13: 


TschumperleDericheFilter(J,  T,  dt,  crg,  crs,  a1,a2) 

Input:  I  =  7^),  color  image  of  size  M  x  N  with  K 

channels;  T,  number  of  iterations;  dt:  time  increment;  crg,  width 
of  the  Gaussian  kernel  for  smoothing  the  gradient;  crs,  width  of 
the  Gaussian  kernel  for  smoothing  the  structure  matrix;  al5a2, 
diffusion  parameters  for  directions  of  min. /max.  variation, 
respectively.  Returns  the  modified  image  7. 

Create  maps: 

D  :  K  xM  xN  ^  M2  >  D  (k,  r,  v)  =  V7fc(R,  a),  grad,  vector 

H  :  K  x  M  x  N  — »•  R2x2  >  H (k,u,v)  =  Hfc(R,  a),  Hess,  matrix 

G  :  M  x  N  — »•  R2x2  >  G(r,  a)  =  G(u,  a),  structure  matrix 

A  :  MxiV  ^  R2x2  d>  A (r,  a)  =  A(r,  a),  geometry  matrix 

B:77xMx7V— ^IR  >  B(/c,  r,  a)  =  /3k(u,  a),  velocity 


for  £  X—  1, . . . ,  T  do  >  perform  T  iterations 

for  k  V-  1, . . . ,  K  and  all  coordinates  (r,  r)  E  M  x  N  do 


D (k,  r,  r)  <— 


(4^J)M 


(Ife  )(u,v) 


>  Eq.  17.68-17.69 


y 


H(/c,  R,  r)  E- 


/  (4*^Jx)Gt)  (Ik*H?y)(u,v) 
y(Ik*H?y)(u,v)  (Ik*H*)(u,v) 


Eq.  17.71 
-17.72 


°d 


D  <-  D  *  77, 
for  all  coordinates  (r,  r)  E  M  x  N  do 


G 


>  smooth  elements  of  D  over  (r,  r) 


x 


g(w,u)  4-  y 

k= l 


(Dx(k,u,v))2 
□  X(k,u,v)-Dy(k,u,v) 


□  X(k,u,v)-Dy(k,u,v) 
(D  y(k,u,v))2 


G  G  *  Hq  >  smooth  elements  of  G  over  (r,  r) 

for  all  coordinates  (r,  r)  E  M  x  N  do 

(Ai,  A2,  Qi,  q2)  RealEigenValues2x2(G(R,  a))  t>  p.  724 

qx  V-  >  normalize  1st  eigenvector  (X1  >  A2) 


14: 

15: 

16: 

17: 

18: 

19: 

20: 

21: 

22: 

23: 


Ci  <— 


(l  +  A1  +  A2)  1  ’ 


c2  v - 


(1  +  A1  +  A2)  2 


AL  f  c1'^l+c2‘^l  (c2-ci)-xi*s/i  A 

’  \(c2^cl)'hh  c1-x2  +  c2-y2  ) 


Anax  <^1 

for  k  e —  1, . . . ,  K  and  all  (r,  r)  E  A7  x  IV  do 
B(/c,  r,  r)  e-  trace(A(R,  r)  •  H(/c,  r,  r)) 

/^max  ^  max(^max)  |  B(T,  R,  a)  | ) 

^  d^j /5max 


>  Eq.  17.79 

>  Eq.  17.78 


>  (3k:  Eq.  17.82 
>  Eq.  17.83 


for  k  V-  1, . . . ,  K  and  all  (r,  r)  E  M  x  N  do 

Ik(u,  v )  E-  Ik(u,  a)  +  a  •  B(/c,  r,  r)  t>  update  the  image 


return  I 


blurring  occurs  in  the  direction  of  the  contours,  as  shown  in  Figs. 
17.21(b,  c). 

17.4  Java  Implementation 

Implementations  of  the  filters  described  in  this  chapter  are  avail¬ 
able  as  part  of  the  imagingbook18  library  at  the  book’s  website. 
The  associated  classes  KuwaharaFilter,  NagaoMatsuyamaFilter, 
PeronaMalikFilter  and  TschumperleDericheFilter  are  based  on 
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Package  imagingbook . pub . edgepreservingf ilters. 


17.4  Java 
Implementation 


Fig.  17.21 

Tschumperle-Deriche  filter  ex¬ 
ample.  The  non-isotropy  of 
the  filter  can  be  adjusted  by 
changing  parameter  a1,  which 
controls  the  diffusion  along 
contours  (see  Eqn.  (17.79)): 
ax  =  0.50,  0.25,  0.00  (a-c). 
Parameter  a2  =  0.90  (con¬ 
stant)  controls  the  diffusion  in 
the  direction  of  the  gradient 
(perpendicular  to  contours). 
Remaining  settings  are  T  —  20, 
dt  =  20,  crg  =  0.5,  <ts  =  0.5  (see 
the  description  of  Alg.  17.9); 
original  image  in  Fig.  17.3(a). 


the  common  super-class  GenericFilter19  and  define  the  following 
constructors: 

KuwaharaFilter  (Parameters  p) 

Creates  a  Kuwahara-type  filter  for  grayscale  and  color  images, 
as  described  in  Sec.  17.1  (Alg.  17.2),  with  radius  r  (default  2) 
and  variance  threshold  t sigma  (denoted  ta  in  Alg.  17.2,  default 
0.0).  The  size  of  the  resulting  filter  is  (2 r  +  1)  x  (2 r  +  1). 

BilateralFilter  (Parameters  p) 

Creates  a  bilateral  filter  for  grayscale  and  color  images  us¬ 
ing  Gaussian  kernels,  as  described  in  Sec.  17.2  (seeAlgs.  17.4 
and  17.5).  Parameters  sigmaD  (<rd,  default  2.0)  and  sigmaR 
(<rr,  default  50.0)  specify  the  widths  of  the  domain  and  the 
range  kernels,  respectively.  The  type  of  norm  for  measur¬ 
ing  color  distances  is  specified  by  colorNormType  (default  is 
NormType .  L2). 

BilateralFilterSeparable  (Parameters  p) 

Creates  a  x/y-separable  bilateral  filter  for  grayscale  and  color 
images,  (see  Alg.  17.6).  Constructor  parameters  are  the  same 
as  for  the  class  BilateralFilter  above. 


19 


Package  imagingbook.  lib . filters.  Filters  of  this  type  can  be  applied 
to  images  using  the  method  applyTo  (ImageProcessor  ip) ,  as  described 
in  Chapter  15,  Sec.  15.3. 
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PeronaMalikFilter  (Parameters  p) 

Creates  an  anisotropic  diffusion  filter  for  grayscale  and  color 
images  (see  Algs.  17.7  and  17.8).  The  key  parameters  and 
their  default  values  are  iterations  (T  =  10),  alpha  (a  = 
0.2),  kappa  (ft  =  25),  smoothRegions  (true),  colorMode 
(SeparateChannels).  With  smoothRegions  =  true,  function 

g ^  is  used  to  control  conductivity,  otherwise  g ^  (see  Eqn. 
(17.52)).  For  filtering  color  images,  three  different  color  modes 
can  be  specified  for  diffusion  control:  SeparateChannels, 
BrightnessGradient,  or  ColorGradient.  See  Prog.  17.1  for 
an  example  of  using  this  class  in  a  simple  ImageJ  plugin. 

TschumperleDericheFilter  (Parameters  p) 

Creates  an  anisotropic  diffusion  filter  for  color  images,  as  de¬ 
scribed  in  Sec.  17.3.4  (Alg.  17.9).  Parameters  and  default  val¬ 
ues  are  iterations  (T  =  20),  dt  (dt  =  20),  sigmaG  (<rg  =  0.0), 
sigmaS  (crs  =  0.5),  al  (a1  =  0.25),  a2  (a2  =  0.90).  Otherwise 
the  usage  of  this  class  is  analogous  to  the  example  in  Prog. 
17.1. 

All  default  values  pertain  to  the  parameterless  constructors  that  are 
also  available.  Note  that  these  filters  are  generic  and  can  be  applied 
to  grayscale  and  color  images  without  any  modification. 


17.5  Exercises 


Exercise  17.1.  Implement  a  pure  range  filter  (Eqn.  (17.17))  for 
grayscale  images,  using  a  ID  Gaussian  kernel 


Ht(x)  = 


Investigate  the  effects  of  this  filter  upon  the  image  and  its  histogram 
for  <j  =  10,  20,  and  25. 


Exercise  17.2.  Modify  the  Kuwahara-type  filter  for  color  images  in 
Alg.  17.3  to  use  the  norm  of  the  color  covariance  matrix  (as  de¬ 
fined  in  Eqn.  (17.12))  for  quantifying  the  amount  of  variation  in  each 
subregion.  Estimate  the  number  of  additional  calculations  required 
for  processing  each  image  pixel.  Implement  the  modified  algorithm, 
compare  the  results  and  execution  times. 

Exercise  17.3.  Modify  the  separable  bilateral  filter  algorithm  (given 
in  Alg.  17.6)  to  handle  color  images,  using  Alg.  17.5  as  a  starting 
point.  Implement  and  test  your  algorithm,  compare  the  results  (see 
also  Fig.  17.14)  and  execution  times. 

Exercise  17.4.  Verify  (experimentally)  that  n  iterations  of  the  dif¬ 
fusion  process  defined  in  Eqn.  (17.45)  have  the  same  effect  as  a  Gaus¬ 
sian  filter  of  width  crn,  as  stated  in  Eqn.  (17.48).  To  determine  the 
impulse  response  of  the  resulting  diffusion  filter,  use  an  “impulse” 
test  image,  that  is,  a  black  (zero- valued)  image  with  a  single  bright 
pixel  at  the  center. 
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1 

import  i j . ImagePlus ; 

2 

import  ij .plugin. filter . PluglnFilter ; 

3 

import  i j . process . ImageProcessor ; 

4 

import  imagingbook. . .PeronaMalikFilter; 

5 

6 

7 

8 

9 

10 

import  imagingbook . . . PeronaMalikFilter . ColorMode ; 
import  imagingbook. . . PeronaMalikFilter . Parameters ; 

public  class  Perona_Malik_Demo  implements 

PluglnFilter  { 

public  int  setup (String  argO,  ImagePlus 

imp)  { 

11 

return  D0ES_ALL  +  D0ES_STACKS; 

12 

} 

13 

14 

public  void  run (ImageProcessor  ip)  { 

15 

//  create  a  parameter  object: 

16 

Parameters  params  =  new  Parameters () ; 

17 

18 

//  modify  filter  settings  if  needed: 

19 

params . iterations  =  20; 

20 

params. alpha  =  0.15f; 

21 

params. kappa  =  20. Of; 

22 

params . smoothRegions  =  true ; 

23 

params . ColorMode  =  ColorMode . ColorGradient ; 

24 

25 

//  instantiate  the  filter  object: 

26 

PeronaMalikFilter  filter  = 

27 

new  PeronaMalikFilter (params) ; 

28 

29 

//  apply  the  filter: 

30 

filter . applyTo (ip) ; 

31 

} 

32 

33 

} 

Prog.  17.1 

Perona-Malik  filter  (complete 
Image J  plugin).  Inside  the 
run()  method,  a  parame¬ 
ter  object  (instance  of  class 
PeronaMalikFilter . Parameters) 
is  created  in  line  16.  Individual 
parameters  may  then  be  mod¬ 
ified,  as  shown  in  lines  19—23. 
This  would  typically  be  done 
be  querying  the  user  (e.g., 
with  ImageJ’s  GenericDialog 
class).  In  line  27,  a  new  in¬ 
stance  of  PeronaMalikFilter  is 
created,  the  parameter  object 
(params)  being  passed  to  the 
constructor  as  the  only  argu¬ 
ment.  Finally,  in  line  30,  the 
filter  is  (destructively)  applied 
to  the  input  image,  that  is, 
ip  is  modified.  ColorMode  (in 
line  23)  is  implemented  as  an 
enumeration  type  within  class 
PeronaMalikFilter,  providing 
the  options  SeparateChannels 
(default),  BrightnessGradient 
and  ColorGradient.  Note  that, 
as  specified  in  the  setupO 
method,  this  plugin  works  for 
any  type  of  image  and  image 
stacks. 


Exercise  17.5.  Use  the  signal-to-noise  ratio  (SNR)  to  measure  the 
effectiveness  of  noise  suppression  by  edge-preserving  smoothing  filters 
on  grayscale  images.  Add  synthetic  Gaussian  noise  (see  Sec.  D.4.3  in 
the  Appendix)  to  the  original  image  I  to  create  a  corrupted  image  /. 
Then  apply  the  filter  to  I  to  obtain  /.  Finally,  calculate  SNR(7, 7) 
as  defined  in  Eqn.  (13.2).  Compare  the  SNR  values  obtained  with 
various  types  of  filters  and  different  parameter  settings,  for  example, 
for  the  Kuwahara  filter  (Alg.  17.2),  the  bilateral  filter  (Alg.  17.4),  and 
the  Perona-Malik  anisotropic  diffusion  filter  (Alg.  17.7).  Analyze  if 
and  how  the  SNR  values  relate  to  the  perceived  image  quality. 
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Introduction  to  Spectral  Techniques 


The  following  three  chapters  deal  with  the  representation  and  anal¬ 
ysis  of  images  in  the  frequency  domain,  based  on  the  decomposi¬ 
tion  of  image  signals  into  sine  and  cosine  functions  using  the  well- 
known  Fourier  transform.  Students  often  consider  this  a  difficult 
topic,  mainly  because  of  its  mathematical  flavor  and  that  its  practi¬ 
cal  applications  are  not  immediately  obvious.  Indeed,  most  common 
operations  and  methods  in  digital  image  processing  can  be  sufficiently 
described  in  the  original  signal  or  image  space  without  even  mention¬ 
ing  spectral  techniques.  This  is  the  reason  why  we  pick  up  this  topic 
relatively  late  in  this  text. 

While  spectral  techniques  were  often  used  to  improve  the  effi¬ 
ciency  of  image-processing  operations,  this  has  become  increasingly 
less  important  due  to  the  high  power  of  modern  computers.  There 
exist,  however,  some  important  effects,  concepts,  and  techniques  in 
digital  image  processing  that  are  considerably  easier  to  describe  in 
the  frequency  domain  or  cannot  otherwise  be  understood  at  all.  The 
topic  should  therefore  not  be  avoided  all  together.  Fourier  analysis 
not  only  owns  a  very  elegant  (perhaps  not  always  sufficiently  ap¬ 
preciated)  mathematical  theory  but  interestingly  enough  also  com¬ 
plements  some  important  concepts  we  have  seen  earlier,  in  particular 
linear  filters  and  linear  convolution  (see  Chapter  5,  Sec.  5.2).  Equally 
important  are  applications  of  spectral  techniques  in  many  popular 
methods  for  image  and  video  compression,  and  they  provide  valuable 
insight  into  the  mechanisms  of  sampling  (discretization)  of  continu¬ 
ous  signals  as  well  as  the  reconstruction  and  interpolation  of  discrete 
signals. 

In  the  following,  we  first  give  a  basic  introduction  to  the  concepts 
of  frequency  and  spectral  decomposition  that  tries  to  be  minimally 
formal  and  thus  should  be  easily  “digestible”  even  for  readers  without 
previous  exposure  to  this  topic.  We  start  with  the  representation  of 
ID  signals  and  will  then  extend  the  discussion  to  2D  signals  (images) 
in  the  next  chapter.  Subsequently,  Chapter  20  briefly  explains  the 
discrete  cosine  transform ,  a  popular  variant  of  the  discrete  Fourier 

transform  that  is  frequently  used  in  image  compression.  ^53 
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Fig.  18.1 

Cosine  and  sine  functions  of 
different  frequency.  The  ex¬ 
pression  cos(cccc)  describes  a 
cosine  function  with  angular 
frequency  u>  at  position  x.  The 
angular  frequency  uo  of  this  pe¬ 
riodic  function  corresponds 
to  a  cycle  length  (period) 
T  =  27 t/uj.  For  uj  =  1,  the 
period  is  T±  =  2ir  (a),  and 
for  l o  =  3  it  is  T3  =  2n/3  ~ 
2.0944  (b).  The  same  holds 
for  the  sine  function  sin(cccc). 
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18.1  The  Fourier  Transform 

The  concept  of  frequency  and  the  decomposition  of  waveforms  into 
elementary  “harmonic”  functions  first  arose  in  the  context  of  music 
and  sound.  The  idea  of  describing  acoustic  events  in  terms  of  “pure” 
sinusoidal  functions  does  not  seem  unreasonable,  considering  that 
sine  waves  appear  naturally  in  every  form  of  oscillation  (e.g.,  on  a 
free-swinging  pendulum). 

18.1.1  Sine  and  Cosine  Functions 

The  well-known  cosine  function, 

f(x )  =  cos(x),  (18-1) 

has  the  value  1  at  the  origin  (cos(0)  =  1)  and  performs  exactly  one 
full  cycle  between  the  origin  and  the  point  x  =  2tt  (Fig.  18.1(a)).  We 
say  that  the  function  is  periodic  with  a  cycle  length  (period)  T  =  27 r; 
that  is, 

cos(t)  =  cos(x  +  27 r)  =  cos(x  +  4n)  =  •  •  •  =  cos(x  +  fc27r),  (18.2) 

for  any  k  G  Z.  The  same  is  true  for  the  corresponding  sine  function, 
except  that  its  value  is  zero  at  the  origin  (since  sin(0)  =  0). 


Frequency  and  amplitude 

The  number  of  oscillations  of  cos(x)  over  the  distance  T  =  2tt  is  one 
and  thus  the  value  of  the  angular  frequency 

U  =  y  =  l.  (18.3) 

If  we  modify  the  cosine  function  in  Eqn.  (18.1)  to 

f(x)  =  cos(3t)  ,  (18-4) 

we  obtain  a  compressed  cosine  wave  that  oscillates  three  times  faster 
than  the  original  function  cos(x)  (see  Fig.  18.1(b)).  The  function 
cos(3x)  performs  three  full  cycles  over  a  distance  of  2tt  and  thus  has 
the  angular  frequency  ix  =  3  and  a  period  T  =  In  general,  the 
period  T  relates  to  the  angular  frequency  uo  as 

T=%,  (18.5) 

for  uj  >  0.  A  sine  or  cosine  function  oscillates  between  peak  values 
+  1  and  —1,  and  its  amplitude  is  1.  Multiplying  by  a  constant  a  E  R 


changes  the  peak  values  of  the  function  to  =b a  and  its  amplitude  to 
a.  In  general,  the  expressions 

a  •  cos (ujx)  and  a  •  sin (ujx) 


denote  a  cosine  or  sine  function,  respectively,  with  amplitude  a  and 
angular  frequency  ce,  evaluated  at  position  (or  point  in  time)  x.  The 
relation  between  the  angular  frequency  uj  and  the  “common”  fre¬ 
quency  /  is  given  by 


/ 


1 

T 


uj 

2n 


or  uj  =  27 r/, 


(18.6) 


respectively,  where  /  is  measured  in  cycles  per  length  or  time  unit.1 
In  the  following,  we  use  either  uj  or  /  as  appropriate,  and  the  meaning 
should  always  be  clear  from  the  symbol  used. 


Phase 

Shifting  a  cosine  function  along  the  x  axis  by  a  distance  </?, 

cos(x)  cos(x  —  (/?), 

changes  the  phase  of  the  cosine  wave,  and  p>  denotes  the  phase  angle 
of  the  resulting  function.  Thus  a  sine  function  is  really  just  a  cosine 
function  shifted  to  the  right2  by  a  quarter  period  (ip  =  vf  =  f ),  so 

sin(c jx)  =  cos  (ujx  —  .  (18-7) 

If  we  take  the  cosine  function  as  the  reference  with  phase  (pcos  =  0, 
then  the  phase  angle  of  the  corresponding  sine  function  is  p>sin  =  f  = 
90°. 

Cosine  and  sine  functions  are  “orthogonal”  in  a  sense  and  we  can 
use  this  fact  to  create  new  “sinusoidal”  functions  with  arbitrary  fre¬ 
quency,  phase,  and  amplitude.  In  particular,  adding  a  cosine  and  a 
sine  function  with  the  identical  frequencies  uj  and  arbitrary  ampli¬ 
tudes  A  and  B ,  respectively,  creates  another  sinusoid: 

A  •  cos(c ox)  +  B  •  sin  (ujx)  =  C  •  cos(ujx  —  ip).  (18.8) 

The  resulting  amplitude  C  and  the  phase  angle  ip  are  defined  only 
by  the  two  original  amplitudes  A  and  B  as 

C  =  \/ A2  +  B2  and  ip  =  tan-1(^)  .  (18.9) 

Figure  18.2(a)  shows  an  example  with  amplitudes  A  =  B  =  0.5  and 
a  resulting  phase  angle  ip  =  45°. 

1  For  example,  a  temporal  oscillation  with  frequency  /  =  1000  cycles/s 
(Hertz)  has  the  period  T  =  1/1000  s  and  therefore  the  angular  frequency 
uj  =  20007T.  The  latter  is  a  unit  less  quantity. 

2  In  general,  the  function  f(x  —  d)  is  the  original  function  f(x)  shifted  to 
the  right  by  a  distance  d. 
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Fig.  18.2 

Adding  cosine  and  sine  func¬ 
tions  with  identical  frequen¬ 
cies,  A  •  cos(cux)  +  B  •  sin(cccc), 
with  uo  =  3  and  A  —  B  — 
0.5.  The  result  is  a  phase- 
shifted  cosine  function  (dot¬ 
ted  curve)  with  amplitude 
C  =  V0.52  +  0.52  «  0.707 
and  phase  angle  Lp  —  45° 
(a).  If  the  cosine  and  sine 
components  are  treated  as 
orthogonal  vectors  (A,  B)  in 
2-space,  the  amplitude  and 
phase  of  the  resulting  sinusoid 
(C)  can  be  easily  determined 
by  vector  summation  (b). 


A  •  cos(cccc)  +  B  •  sin(caaQ 


(b) 


sin(cacc) 


Complex- valued  sine  functions — Euler’s  notation 

Figure  18.2(b)  depicts  the  contributing  cosine  and  sine  components 
of  the  new  function  as  a  pair  of  orthogonal  vectors  in  2-space  whose 
lengths  correspond  to  the  amplitudes  A  and  B.  Not  coincidentally, 
this  reminds  us  of  the  representation  of  real  and  imaginary  compo¬ 
nents  of  complex  numbers, 


z  =  a  +  i  6, 

in  the  2D  plane  C,  where  i  is  the  imaginary  unit  (i2  =  —1).  This  as¬ 
sociation  becomes  even  stronger  if  we  look  at  Euler’s  famous  notation 
of  complex  numbers  along  the  unit  circle, 

z  =  el9  =  cos (6)  +  i  •  sin(0),  (18.10) 


where  e  ~  2.71828  is  the  Euler  number.  If  we  take  the  expression 
el6  as  a  function  of  the  angle  0  rotating  around  the  unit  circle,  we 
obtain  a  “complex- valued  sinusoid”  whose  real  and  imaginary  parts 
correspond  to  a  cosine  and  a  sine  function,  respectively, 


Re(el6>)  =  cos(0), 
Im(el6>)  =  sin(0). 


(18.11) 


Since  z  =  el°  is  placed  on  the  unit  circle,  the  amplitude  of  the 
complex- valued  sinusoid  is  \z\  =  r  =  1.  We  can  easily  modify  the 
amplitude  of  this  function  by  multiplying  it  by  some  real  value  a  >  0, 
that  is, 

a  •  el6>|  =  a  •  \eld\  =  a  .  (18.12) 

Similarly,  we  can  alter  the  phase  of  a  complex-valued  sinusoid  by 
adding  a  phase  angle  p  in  the  function’s  exponent  or,  equivalently, 
by  multiplying  it  by  a  complex- valued  constant  c  =  e1^, 


456 


gi(0+<£>)  _  eW  m  eip> 


(18.13) 


In  summary,  multiplying  by  some  real  value  affects  only  the  ampli¬ 
tude  of  a  sinusoid,  while  multiplying  by  some  complex  value  c  (with 
unit  amplitude  |c|  =  1)  modifies  only  the  function’s  phase  (without 
changing  its  amplitude).  In  general,  of  course,  multiplying  by  some 
arbitrary  complex  value  changes  both  the  amplitude  and  the  phase 
of  the  function  (also  see  Sec.  A. 3  in  the  Appendix). 

The  complex  notation  makes  it  easy  to  combine  orthogonal  pairs 
of  sine  functions  cos (ujx)  and  sin(ujx)  with  identical  frequencies  u 
into  a  single  expression, 

eld  =  eluJX  =  cos(ujx)  +  i  •  sin  (ujx).  (18.14) 

We  will  make  more  use  of  this  notation  later  (in  Sec.  18.1.4)  to  explain 
the  Fourier  transform. 

18.1.2  Fourier  Series  Representation  of  Periodic  Functions 

As  we  demonstrated  in  Eqn.  (18.8),  sinusoidal  functions  of  arbitrary 
frequency,  amplitude,  and  phase  can  be  described  as  the  sum  of  suit¬ 
ably  weighted  cosine  and  sine  functions.  One  may  wonder  if  non- 
sinusoidal  functions  can  also  be  decomposed  into  a  sum  of  cosine  and 
sine  functions.  The  answer  is  yes,  of  course.  It  was  Fourier3  who 
first  extended  this  idea  to  arbitrary  functions  and  showed  that  (al¬ 
most)  any  periodic  function  g(x)  with  a  fundamental  frequency  ay, 
can  be  described  as  a — possibly  infinite — sum  of  “harmonic”  sinu¬ 
soids,  that  is, 

oo 

g{x)  =  E  Ak  •  cos (kuj0x)  +  Bk  •  sin^cjQx).  (18.15) 

k= 0 

This  is  called  a  Fourier  series ,  and  the  constant  factors  Bk  are 
the  Fourier  coefficients  of  the  function  g(x).  Notice  that  in  Eqn. 
(18.15)  the  frequencies  of  the  sine  and  cosine  functions  contributing 
to  the  Fourier  series  are  integral  multiples  (“harmonics”)  of  the  fun¬ 
damental  frequency  cj0,  including  the  zero  frequency  for  k  =  0.  The 
corresponding  coefficients  Ak  and  Bk ,  which  are  initially  unknown, 
can  be  uniquely  derived  from  the  original  function  g(x).  This  process 
is  commonly  referred  to  as  Fourier  analysis. 

18.1.3  Fourier  Integral 

Fourier  did  not  want  to  limit  this  concept  to  periodic  functions  and 
postulated  that  nonperiodic  functions,  too,  could  be  described  as 
sums  of  sine  and  cosine  functions.  While  this  proved  to  be  true  in 
principle,  it  generally  requires — beyond  multiples  of  the  fundamental 
frequency  (koj0) — infinitely  many,  densely  spaced  frequencies!  The 
resulting  decomposition, 

rOC 

g(x)  =  /  Au  •  cos  (ujx)  +  Bu  •  sin  (ujx)  dcj,  (18.16) 

Jo 

is  called  a  Fourier  integral  and  the  coefficients  Bu  are  again 
the  weights  for  the  corresponding  cosine  and  sine  functions  with  the 

Jean-Baptiste  Joseph  de  Fourier  (1768-1830). 
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(continuous)  frequency  u.  The  Fourier  integral  is  the  basis  of  the 
Fourier  spectrum  and  the  Fourier  transform,  as  will  be  described  (for 
details,  see,  e.g.,  [35,  Ch.  15,  Sec.  15.3]). 

In  Eqn.  (18.16),  every  coefficient  A w  and  B ^  specifies  the  ampli¬ 
tude  of  the  corresponding  cosine  or  sine  function,  respectively.  The 
coefficients  thus  define  “how  much  of  each  frequency”  contributes 
to  a  given  function  or  signal  g(x).  But  what  are  the  proper  values 
of  these  coefficients  for  a  given  function  g(x),  and  can  they  be  de¬ 
termined  uniquely?  The  answer  is  yes  again,  and  the  “recipe”  for 
computing  the  coefficients  is  amazingly  simple: 


Aw  =  A(lo) 
=  B(u>) 


1 

7 r 
1 
7 r 


*  oo 


g(x)  •  cos(cex)  dx, 


■  oo 
•oo 


g(x)  •  sin (ujx)  dx. 


—  oo 


(18.17) 


Since  this  representation  of  the  function  g(x)  involves  infinitely  many 
densely  spaced  frequency  values  cj,  the  corresponding  coefficients 
A(cj)  and  B( oj)  are  indeed  continuous  functions  as  well.  They  hold 
the  continuous  distribution  of  frequency  components  contained  in  the 
original  signal,  which  is  called  a  “spectrum”. 

Thus  the  Fourier  integral  in  Eqn.  (18.16)  describes  the  original 
function  g{pc)  as  a  sum  of  infinitely  many  cosine  and  sine  functions, 
with  the  corresponding  Fourier  coefficients  contained  in  the  functions 
A(uj)  and  B(u).  In  addition,  a  signal  g(x)  is  uniquely  and  fully  rep¬ 
resented  by  the  corresponding  coefficient  functions  A{uj)  and  B(uj). 
We  know  from  Eqn.  (18.17)  how  to  compute  the  spectrum  for  a  given 
function  g(x),  and  Eqn.  (18.16)  explains  how  to  reconstruct  the  orig¬ 
inal  function  from  its  spectrum  if  it  is  ever  needed. 


18.1.4  Fourier  Spectrum  and  Transformation 


There  is  now  only  a  small  remaining  step  from  the  decomposition 
of  a  function  g(x),  as  shown  in  Eqn.  (18.17),  to  the  “real”  Fourier 
transform.  In  contrast  to  the  Fourier  integral ,  the  Fourier  transform 
treats  both  the  original  signal  and  the  corresponding  spectrum  as 
complex- valued  functions,  which  considerably  simplifies  the  resulting 
notation. 

Based  on  the  functions  A(uj)  and  B(u)  defined  in  the  Fourier 
integral  (Eqn.  (18.17)),  the  Fourier  spectrum  G{uj)  of  a  function  g(x) 
is  given  as 


G{uS)  =  •  [A(u)  —  i  •  B{(jS) 


r  oo 

L7T 


g(x)  •  cos  {ujx)  dx 


oo 


1 

i  •  —  /  g(x)  •  sin  (ujx)  dx 

7 T 


■oo 


1 


•oo 


V2 


7 T  J  - 


g(x)  •  cos (cvx)  —  i  •  sm(ux)  dx  , 


(18.18) 


oo 
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with  g(x),G(<jj)  G  C.  Using  Euler’s  notation  of  complex  values  (see 
Eqn.  (18.14))  yields  the  continuous  Fourier  spectrum  in  Eqn.  (18.18) 
in  its  common  form: 


(18.19) 
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G(w)  = 


1 


■CX) 


v2 

1 


g(x)  •  cos(ojx)  —  i  •  sin(t ux)  dx 


7T  —  cx> 

CX) 


V2 


7T  J  - 


g(x)  •  e  lcJ:r  dx  . 


CX) 


The  transition  from  the  function  g(x)  to  its  Fourier  spectrum  G(uS)  is 
called  the  Fourier  transform 4  (J7).  Conversely,  the  original  function 
g(x)  can  be  reconstructed  completely  from  its  Fourier  spectrum  G(cj) 
using  the  inverse  Fourier  transform 5  (J7-1),  defined  as 


g(x)  = 


l 


■CX) 


V2 

1 


G(ca)  •  cos(cvx)  +  i  •  sin(o;x)  dev 


7T  J  —  cx; 

CX) 


V2 


7 r 


G(cj)  •  eiu;iC  dee  . 


■CX) 


(18.20) 


In  general,  even  if  one  of  the  involved  functions  (g(x)  or  G(ca)) 
is  real- valued  (which  is  usually  the  case  for  physical  signals  g(x)), 
the  other  function  is  complex- valued.  One  may  also  note  that  the 
forward  transformation  T  (Eqn.  (18.19))  and  the  inverse  transfor¬ 
mation  J7-1  (Eqn.  (18.20))  are  almost  completely  symmetrical,  the 
sign  of  the  exponent  being  the  only  difference.4 5 6  The  spectrum  pro¬ 
duced  by  the  Fourier  transform  is  a  new  representation  of  the  signal 
in  a  space  of  frequencies.  Apparently,  this  “frequency  space”  and  the 
original  “signal  space”  are  dual  and  interchangeable  mathematical 
representations. 

18.1.5  Fourier  Transform  Pairs 

The  relationship  between  a  function  g{pc)  and  its  Fourier  spectrum 
G(u)  is  unique  in  both  directions:  the  Fourier  spectrum  is  uniquely 
defined  for  a  given  function,  and  for  any  Fourier  spectrum  there  is 
only  one  matching  signal — the  two  functions  g(x)  and 

g{x)  G(u). 

Table  18.1  lists  the  transform  pairs  for  some  selected  analytical  func¬ 
tions,  which  are  also  shown  graphically  in  Figs.  18.3  and  18.4. 

The  Fourier  spectrum  of  a  cosine  function  cos(co’ox),  for  exam¬ 
ple,  consists  of  two  separate  thin  pulses  arranged  symmetrically  at  a 
distance  ca0  from  the  origin  (Fig.  18.3(a,c)).  Intuitively,  this  corre¬ 
sponds  to  our  physical  understanding  of  a  spectrum  (e.g.,  if  we  think 
of  a  pure  monophonic  sound  in  acoustics  or  the  thin  line  produced  by 
some  extremely  pure  color  in  the  optical  spectrum).  Increasing  the 
frequency  ca0  would  move  the  corresponding  pulses  in  the  spectrum 

4  Also  called  the  “direct”  or  “forward”  transformation. 

5  Also  called  “backward”  transformation. 

6  Various  definitions  of  the  Fourier  transform  are  in  common  use.  They 
are  contrasted  mainly  by  the  constant  factors  outside  the  integral  and 
the  signs  of  the  exponents  in  the  forward  and  inverse  transforms,  but  all 
versions  are  equivalent  in  principle.  The  symmetric  variant  shown  here 
uses  the  same  factor  (1/v2tt)  in  the  forward  and  inverse  transforms. 


459 


18  Introduction  to 
Spectral  Techniques 


Table  18.1 

Fourier  transforms  of  selected 
analytical  functions;  <5()  de¬ 
notes  the  “impulse”  or  Dirac 
function  (see  Sec.  18.2.1). 


Function 

Transform  pair  g(x)  °~9  G(uj) 

Figure 

Cosine  function 
with  frequency  cc0 

g(x)  =  cos(cjqt) 

G(uj)  —  \f\  '  (^(cc  +  cj0)  +  S(cj  —  cj0)) 

18.3(a,c) 

Sine  function  with 
frequency  cc0 

g(x)  =  sin(o;oT) 

G{w)  =  i\/f  •  (<5U  +  ^o)-£U-^o)) 

18.3(b,d) 

Gaussian  function 
of  width  a 

cc2 

g(x)  =  -  ■  e 

G(uj)  =  e  "  2“ 

18.4(a,b) 

Rectangular 
pulse  of  width  2 b 

f  1  \x  <  b 

g(x)  =  nb(x )  = 

^  0  sonst 

/  \  2b  sin(ba;) 

GU)= 

18.4(c,d) 

away  from  the  origin.  Notice  that  the  spectrum  of  the  cosine  function 
is  real-valued,  the  imaginary  part  being  zero.  Of  course,  the  same 
relation  holds  for  the  sine  function  (Fig.  18.3(b,d)),  with  the  only 
difference  being  that  the  pulses  have  different  polarities  and  appear 
in  the  imaginary  part  of  the  spectrum.  In  this  case,  the  real  part  of 
the  spectrum  G(oj)  is  zero. 

The  Gaussian  function  is  particularly  interesting  because  its 
Fourier  spectrum  is  also  a  Gaussian  function  (Fig.  18.4(a, b))!  It  is 
one  of  the  few  examples  where  the  function  type  in  frequency  space 
is  the  same  as  in  signal  space.  With  the  Gaussian  function,  it  is  also 
clear  to  see  that  stretching  a  function  in  signal  space  corresponds  to 
shortening  its  spectrum  and  vice  versa. 

The  Fourier  transform  of  a  rectangular  pulse  (Fig.  18.4(c,  d))  is  the 
“Sine”  function  of  type  sin(x)/x.  With  increasing  frequencies,  this 
function  drops  off  quite  slowly,  which  shows  that  the  components 
contained  in  the  original  rectangular  signal  are  spread  out  over  a 
large  frequency  range.  Thus  a  rectangular  pulse  function  exhibits  a 
very  wide  spectrum  in  general. 

18.1.6  Important  Properties  of  the  Fourier  Transform 
Symmetry 

The  Fourier  spectrum  extends  over  positive  and  negative  frequen¬ 
cies  and  could,  in  principle,  be  an  arbitrary  complex- valued  function. 
However,  in  many  situations,  the  spectrum  is  symmetric  about  its 
origin  (see,  e.g.,  [43,  p.  178]).  In  particular,  the  Fourier  transform  of 
a  real- valued  signal  g(x)  E  R  is  a  so-called  Hermite  function  with  the 
property 

G{u)  =  G*{- u),  (18.21) 

where  G *  denotes  the  complex  conjugate  of  G  (see  also  Sec.  A. 3  in 
the  Appendix). 
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Fourier  transform  pairs — cosine 
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Fig.  18.4 

Fourier  transform 
pairs — Gaussian  func¬ 
tions  and  square  pulses. 
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(c)  Pulse  (b=  1):  g(x)  =  77 \{x)  °-*  G(uj)  = 


(d)  Pulse  (b  =  2):  g(x)  —  772(x) 
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Linearity 

The  Fourier  transform  is  also  a  linear  operation  such  that  multiply¬ 
ing  the  signal  by  a  constant  value  c  E  C  scales  the  corresponding 
spectrum  by  the  same  amount, 

a  •  g{pc)  °_#  a  •  G{oS).  (18.22) 

Linearity  also  means  that  the  transform  of  the  sum  of  two  signals 
g(x)  =  gi(x)  +  g2(x)  is  identical  to  the  sum  of  their  individual  trans¬ 
forms  Gi(oj)  and  G2(lu)  and  thus 

9i(x)  +  92(x)  0-0  G1(uj)  +  G2(cu).  (18.23) 

Similarity 

If  the  original  function  g(x)  is  scaled  in  space  or  time,  the  opposite 
effect  appears  in  the  corresponding  Fourier  spectrum.  In  particular, 
as  observed  on  the  Gaussian  function  in  Fig.  18.4,  stretching  a  signal 
by  a  factor  s  (i.e.,  g(x)  — >•  g(sx))  leads  to  a  shortening  of  the  Fourier 
spectrum: 

g(sx)  °-*  |^y  •  G  ( j) .  (18.24) 

Similarly,  the  signal  is  shortened  if  the  corresponding  spectrum  is 
stretched. 

Shift  property 

If  the  original  function  g{pc)  is  shifted  by  a  distance  d  along  its  coordi¬ 
nate  axis  (i.e.,  g{x)  g(x—d )),  then  the  Fourier  spectrum  multiplies 
by  the  complex  value  e~lujd  dependent  on  cu: 

g(x  —  d )  °— •  e~lujd  •  G(u).  (18.25) 

Since  e~lujd  lies  on  the  unit  circle,  the  multiplication  causes  a  phase 
shift  on  the  spectral  values  (i.e.,  a  redistribution  between  the  real 
and  imaginary  components)  without  altering  the  magnitude  \G(oj)  . 
Obviously,  the  amount  (angle)  of  phase  shift  (ud)  is  proportional  to 
the  angular  frequency  u. 

Convolution  property 

From  the  image-processing  point  of  view,  the  most  interesting  prop¬ 
erty  of  the  Fourier  transform  is  its  relation  to  linear  convolution  (see 
Ch.  5,  Sec.  5.3.1).  Let  us  assume  that  we  have  two  functions  g{x) 
and  h(x)  and  their  corresponding  Fourier  spectra  G( uo)  and  if  (cj),  re¬ 
spectively.  If  the  original  functions  are  subject  to  linear  convolution 
(i.e.,  g{pc)  *  h(x))1  then  the  Fourier  transform  of  the  result  equals  the 
(pointwise)  product  of  the  individual  Fourier  transforms  G(u)  and 
H(u): 

g(x)  *  h(x)  °_#  G( oS)  •  H(uS).  (18.26) 

Due  to  the  duality  of  signal  space  and  frequency  space,  the  same  also 
holds  in  the  opposite  direction;  i.e.,  a  pointwise  multiplication  of  two 
signals  is  equivalent  to  convolving  the  corresponding  spectra: 

g(x)  •  h(x)  {>_#  G(cu)  *  (18.27) 

A  multiplication  of  the  functions  in  one  space  (signal  or  frequency 
space)  thus  corresponds  to  a  linear  convolution  of  the  Fourier  spectra 
in  the  opposite  space. 
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18.2  Working  with  Discrete  Signals 

The  definition  of  the  continuous  Fourier  transform  in  Sec.  18.1  is  of 
little  use  for  numerical  computation  on  a  computer.  Neither  can  ar¬ 
bitrary  continuous  (and  possibly  infinite)  functions  be  represented  in 
practice.  Nor  can  the  required  integrals  be  computed.  In  reality,  we 
must  always  deal  with  discrete  signals,  and  we  therefore  need  a  new 
version  of  the  Fourier  transform  that  treats  signals  and  spectra  as 
finite  data  vectors — the  “discrete”  Fourier  transform.  Before  contin¬ 
uing  with  this  issue  we  want  to  use  our  existing  wisdom  to  take  a 
closer  look  at  the  process  of  discretizing  signals  in  general. 


18.2.1  Sampling 

We  first  consider  the  question  of  how  a  continuous  function  can  be 
converted  to  a  discrete  signal  in  the  first  place.  This  process  is  usually 
called  “sampling”  (i.e.,  taking  samples  of  the  continuous  function 
at  certain  points  in  time  (or  in  space),  usually  spaced  at  regular 
distances).  To  describe  this  step  in  a  simple  but  formal  way,  we 
require  an  inconspicuous  but  nevertheless  important  piece  from  the 
mathematician’s  toolbox. 


The  impulse  function  S(x) 

We  casually  encountered  the  impulse  function  (also  called  the  delta 
or  Dirac  function)  earlier  when  we  looked  at  the  impulse  response 
of  linear  filters  (see  Ch.  5,  Sec.  5.3.4)  and  in  the  Fourier  transforms 
of  the  cosine  and  sine  functions  (Fig.  18.3).  This  function,  which 
models  a  continuous  “ideal”  impulse,  is  unusual  in  several  respects: 
its  value  is  zero  everywhere  except  at  the  origin,  where  it  is  nonzero 
(though  undefined),  but  its  integral  is  one,  that  is, 

/oo 

S(x)  dx  =  1  .  (18.28) 

-oo 


One  could  imagine  S(x)  as  a  single  pulse  at  position  x  =  0  that 
is  infinitesimally  narrow  but  still  contains  finite  energy  (1).  Also 
remarkable  is  the  impulse  function’s  behavior  under  scaling  along 
the  time  (or  space)  axis  (i.e.,  5(x)  -T  S(sx)),  with 


6(sx) 


(18.29) 


for  s  /  0.  Despite  the  fact  that  S(x)  does  not  exist  in  physical 
reality  and  cannot  be  plotted  (the  corresponding  plots  in  Fig.  18.3 
are  for  illustration  only),  this  function  is  a  useful  mathematical  tool 
for  describing  the  sampling  process,  as  will  be  shown. 


Sampling  with  the  impulse  function 

Using  the  concept  of  the  ideal  impulse,  the  sampling  process  can  be 
described  in  a  straightforward  and  intuitive  way.7  If  a  continuous 

7  The  following  description  is  intentionally  a  bit  superficial  (in  a  mathe¬ 
matical  sense).  See,  for  example,  [43,128]  for  more  precise  coverage  of 
these  topics. 


function  g(x)  is  multiplied  with  the  impulse  function  S(x),  we  obtain  2  Working  with 
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g(x)  =  g(x)  •  S(x) 


g( 0)  for  x  =  0, 
0  otherwise. 


(18.30) 


The  resulting  function  g{x)  consists  of  a  single  pulse  at  position  0 
whose  height  corresponds  to  the  original  function  value  g( 0)  (at  po¬ 
sition  0).  Thus,  by  multiplying  the  function  g{pc)  by  the  impulse 
function,  we  obtain  a  single  discrete  sample  value  of  g{pc)  at  position 
x  =  0.  If  the  impulse  function  S(x)  is  shifted  by  a  distance  x0,  we 
can  sample  g(x)  at  an  arbitrary  position  X  —  Xq  , 


g(x)  =  g(x)  •  S(x-x0 ) 


g  Go)  for  X  —  Xq  , 
0  otherwise. 


(18.31) 


Here  S(x—x0)  is  the  impulse  function  shifted  by  x0,  and  the  resulting 
function  g(x)  is  zero  except  at  position  x0,  where  it  contains  the 
original  function  value  g(x0).  This  relationship  is  illustrated  in  Fig. 
18.5  for  the  sampling  position  x0  =  3. 


Fig.  18.5 

Sampling  with  the  impulse 
function.  The  continuous  sig¬ 
nal  g(x)  is  sampled  at  position 
x0  =  3  by  multiplying  g(x) 
by  a  shifted  impulse  function 
5{x  —  3). 


To  sample  the  function  g(x)  at  more  than  one  position  simulta¬ 
neously  (e.g.,  at  positions  x1  and  x2),  we  use  two  separately  shifted 
versions  of  the  impulse  function,  multiply  g(x)  by  both  of  them,  and 
simply  add  the  resulting  function  values.  In  this  particular  case,  we 
get 

g(x)  =  g(x)  •  S(x  —  Xi)  +  g(x)  •  S(x  —  x2) 

=  g{pc)  •  [5(x  —  xi)  +  5(x  —  x2)\ 

g(xi)  for 
g(x2)  for  x  =  x2, 

0  otherwise. 

From  Eqn.  (18.33),  sampling  a  continuous  function  g(x)  at  N  posi¬ 
tions  x{  =  1,2,  ...,1V  can  thus  be  described  as  the  sum  of  the  N 
individual  samples,  that  is, 

g(x)  =  g(x)  •  [S(x  —  1)  +  S(x  —  2)  +  . . .  +  S(x  —  N) 

N 

=  g(x)  •  E  S(x-i) .  (18.35) 

2=1 

The  comb  function 

The  sum  of  shifted  impulses  S(x  —  i)  in  Eqn.  (18.35)  is  called  a 

pulse  sequence  or  pulse  train.  Extending  this  sequence  to  infinity  in 
both  directions,  we  obtain  the  “comb”  or  “Shah”  function 


(18.32) 

(18.33) 

(18.34) 
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(18.36) 


CX) 

ffl( x )  =  5(x  —  i)  . 

i=  —  oo 

The  process  of  discretizing  a  continuous  function  by  taking  samples 
at  regular  integral  intervals  can  thus  be  written  simply  as 

g(x)  =  g(x)  •  ffl(x),  (18.37) 

that  is,  as  a  pointwise  multiplication  of  the  original  signal  g{pc)  with 
the  comb  function  m(x).  As  Fig.  18.6  illustrates,  the  function  values 
of  g(x)  at  integral  positions  xi  G  Z  are  transferred  to  the  discrete 
function  g(xj)  and  ignored  at  all  non-integer  positions. 

Of  course,  the  sampling  interval  (i.e. ,  the  distance  between  adja¬ 
cent  samples)  is  not  restricted  to  1.  To  take  samples  at  regular  but 
arbitrary  intervals  r,  the  sampling  function  ]H(t)  is  simply  scaled 
along  the  time  or  space  axis;  that  is, 

g(x)  =  g(x)  •  m  (A)  ,  for  r  >  0.  (18.38) 

Effects  of  sampling  in  frequency  space 

Despite  the  elegant  formulation  made  possible  by  the  use  of  the  comb 
function,  one  may  still  wonder  why  all  this  math  is  necessary  to  de¬ 
scribe  a  process  that  appears  intuitively  to  be  so  simple  anyway.  The 
Fourier  spectrum  gives  one  answer  to  this  question.  Sampling  a  con¬ 
tinuous  function  has  massive — though  predictable — effects  upon  the 
frequency  spectrum  of  the  resulting  (discrete)  signal.  Using  the  comb 
function  as  a  formal  model  for  the  sampling  process  makes  it  rela¬ 
tively  easy  to  estimate  and  interpret  those  spectral  effects.  Similar  to 
the  Gaussian  (see  Sec.  18.1.5),  the  comb  function  features  the  special 
property  that  its  Fourier  transform 

ffl(x)  °_#  ffl(A-u;)  (18.39) 


Fig.  18.6 

Sampling  with  the  comb  func¬ 
tion.  The  original  continuous 
signal  g(x)  is  multiplied  by 
the  comb  function  III(cc).  The 
function  value  g(x)  is  trans¬ 
ferred  to  the  resulting  function 
g(x)  only  at  integral  positions 
x  =  xi  G  Z  and  ignored 
at  all  non-integer  positions. 
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is  again  a  comb  function  (i.e.,  the  same  type  of  function).  In  gen¬ 
eral,  the  Fourier  transform  of  a  comb  function  scaled  to  an  arbitrary 
sampling  interval  r  is 

m(f)  o-*  rm  (^to)  ,  (18.40) 

due  to  the  similarity  property  of  the  Fourier  transform  (Eqn.  (18.24)). 
Figure  18.7  shows  two  examples  of  the  comb  function  JRr(x)  with 
sampling  intervals  r  =  1  and  r  —  3  and  the  corresponding  Fourier 
transforms. 

Now,  what  happens  to  the  Fourier  spectrum  during  discretiza¬ 
tion,  that  is,  when  we  multiply  a  function  in  signal  space  by  the 
comb  function  m(  — ) ?  We  get  the  answer  by  recalling  the  convolu¬ 
tion  property  of  the  Fourier  transform  (Eqn.  (18.26)):  the  product  of 
two  functions  in  one  space  (signal  or  frequency  space)  corresponds  to 
the  linear  convolution  of  the  transformed  functions  in  the  opposite 
space,  and  thus 

We  already  know  that  the  Fourier  spectrum  of  the  sampling  func¬ 
tion  is  a  comb  function  again  and  therefore  consists  of  a  sequence 
of  regularly  spaced  pulses  (Fig.  18.7).  In  addition,  we  know  that 
convolving  an  arbitrary  function  with  the  impulse  S(x)  returns  the 
original  function;  that  is,  f(x)  *  6 (x)  =  f(x)  (see  Ch.  5,  Sec.  5.3.4). 
Convolving  with  a  shifted  pulse  5(x  —  d)  also  reproduces  the  original 
function  /(x),  though  shifted  by  the  same  distance  d : 

/(x)  *  S(x  —  d)  =  f(x  —  d).  (18.42) 


T  —  1 

(a) 


Comb  function:  III1(cc)  =  III  (a:) 
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Comb  function:  III 3 ( m )  =  III(^a?) 
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Discrete  Signals 


Fig.  18.7 

Comb  function  and  its  Fourier 
transform.  Comb  function 
IIIrO)  for  the  sampling  inter¬ 
val  r  —  1  (a)  and  its  Fourier 
transform.  Comb  function 
for  r  —  3  (c)  and  its  Fourier 
transform  (d).  Note  that  the 
actual  height  of  the  5-pulses  is 
undefined  and  shown  only  for 
illustration. 
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Fig.  18.8 

Spectral  effects  of  sampling. 

The  spectrum  G(lj)  of  the 
original  continuous  signal  is 
assumed  to  be  band-limited 
within  the  range  ±cumax  (a). 
Sampling  the  signal  at  a  rate 
(sampling  frequency)  c js  =  uj1 
causes  the  signal’s  spectrum 
G(lj)  to  be  replicated  at  multi¬ 
ples  of  uj1  along  the  frequency 
(cc)  axis  (b).  Obviously,  the 
replicas  in  the  spectrum  do  not 
overlap  as  long  as  c js  >  2ccmax. 
In  (c),  the  sampling  frequency 
c os  =  c j2  is  less  than  2ccmax, 
so  there  is  overlap  between 
the  replicas  in  the  spectrum, 
and  frequency  components  are 
mirrored  at  2ccmax  and  super¬ 
impose  the  original  spectrum. 
This  effect  is  called  “aliasing” 
because  the  original  spectrum 
(and  thus  the  original  signal) 
cannot  be  reproduced  from 
such  a  corrupted  spectrum. 


*  G{u) 


(a) 


max 


(b) 


t  Gx  (uj) 

i  ► 


As  a  consequence,  the  spectrum  G(uS)  of  the  original  continuous  signal 
becomes  replicated  in  the  Fourier  spectrum  G(uj)  of  a  sampled  signal 
at  every  pulse  of  the  sampling  function’s  spectrum;  that  is,  infinitely 
many  times  (see  Fig.  18.8(a,b))!  Thus  the  resulting  Fourier  spectrum 
is  repetitive  with  a  period  which  corresponds  to  the  sampling 
frequency  ujs. 

Aliasing  and  the  sampling  theorem 

As  long  as  the  spectral  replicas  in  G(oj)  created  by  the  sampling  pro¬ 
cess  do  not  overlap,  the  original  spectrum  G(u) — and  thus  the  origi¬ 
nal  continuous  function — can  be  reconstructed  without  loss  from  any 
isolated  replica  of  G(ca)  in  the  periodic  spectrum  G(oj).  As  we  can 
see  in  Fig.  18.8,  this  requires  that  the  frequencies  contained  in  the 
original  signal  g{pc)  be  within  some  upper  limit  u;max;  that  is,  the  sig¬ 
nal  contains  no  components  with  frequencies  greater  than  cjmax.  The 
maximum  allowed  signal  frequency  cemax  depends  upon  the  sampling 
frequency  ujs  used  to  discretize  the  signal,  with  the  requirement 

^max  —  2  "  ^ s  ^  ^ s  —  ^  *  Aiax1  (18.43) 

Discretizing  a  continuous  signal  g(x)  with  frequency  components  in 
the  range  0  <  uo  <  cdmax  thus  requires  a  sampling  frequency  ujs  of  at 
least  twice  the  maximum  signal  frequency  cdmax.  If  this  condition  is 
not  met,  the  replicas  in  the  spectrum  of  the  sampled  signal  overlap 
(Fig.  18.8(c))  and  the  spectrum  becomes  corrupted.  Consequently, 
the  original  signal  cannot  be  recovered  flawlessly  from  the  sampled 
signal’s  spectrum.  This  effect  is  commonly  called  “aliasing”. 

What  we  just  said  in  simple  terms  is  nothing  but  the  essence  of 
the  famous  “sampling  theorem”  formulated  by  Shannon  and  Nyquist 
(see,  e.g.,  [43,  p.  256]).  It  actually  states  that  the  sampling  frequency 
must  be  at  least  twice  the  bandwidth 8  of  the  continuous  signal  to  avoid 
aliasing  effects.  However,  if  we  assume  that  a  signal’s  frequency  range 

8  This  may  be  surprising  at  first  because  it  allows  a  signal  with  high 
frequency — but  low  bandwidth — to  be  sampled  (and  correctly  recon- 
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starts  at  zero,  then  bandwidth  and  maximum  frequency  are  the  same 
anyway. 

18.2.2  Discrete  and  Periodic  Functions 

Assume  that  we  are  given  a  continuous  signal  g(x)  that  is  periodic 
with  a  period  of  length  T.  In  this  case,  the  corresponding  Fourier 
spectrum  G(ca)  is  a  sequence  of  thin  spectral  lines  equally  spaced  at 
a  distance  ce0  =  27 r/T.  As  discussed  in  Sec.  18.1.2,  the  Fourier  spec¬ 
trum  of  a  periodic  function  can  be  represented  as  a  Fourier  series  and 
is  therefore  discrete.  Conversely,  if  a  continuous  signal  g(pc)  is  sam¬ 
pled  at  regular  intervals  r,  then  the  corresponding  Fourier  spectrum 
becomes  periodic  with  a  period  of  length  ujs  =  27t/t. 

Sampling  in  signal  space  thus  leads  to  periodicity  in  frequency 
space  and  vice  versa.  Figure  18.9  illustrates  this  relationship  and  the 
transition  from  a  continuous  nonperiodic  signal  to  a  discrete  periodic 
function,  which  can  be  represented  as  a  finite  vector  of  numbers  and 
thus  easily  processed  on  a  computer. 

Thus,  in  general,  the  Fourier  spectrum  of  a  continuous,  nonperi¬ 
odic  signal  g(x)  is  also  continuous  and  nonperiodic  (Fig.  18.9(a,b)). 
However,  if  the  signal  g{x)  is  periodic ,  then  the  corresponding  spec¬ 
trum  is  discrete  (Fig.  18.9(c,d)).  Conversely,  a  discrete — but  not  nec¬ 
essarily  periodic — signal  leads  to  a  periodic  spectrum  (Fig.  18.9(e,  f)). 
Finally,  if  a  signal  is  discrete  and  periodic  with  M  samples  per  pe¬ 
riod,  then  its  spectrum  is  also  discrete  and  periodic  with  M  values 
(Fig.  18.9(g,  h)).  Note  that  the  particular  signals  and  spectra  in  Fig. 
18.9  were  chosen  for  illustration  only  and  do  not  really  correspond 
with  each  other. 

18.3  The  Discrete  Fourier  Transform  (DFT) 

In  the  case  of  a  discrete  periodic  signal,  only  a  finite  sequence  of  M 
sample  values  is  required  to  completely  represent  either  the  signal 
g(u)  itself  or  its  Fourier  spectrum  G(m).* * * * * * 9  This  representation  as 
finite  vectors  makes  it  straightforward  to  store  and  process  signals 
and  spectra  on  a  computer.  What  we  still  need  is  a  version  of  the 
Fourier  transform  applicable  to  discrete  signals. 

18.3.1  Definition  of  the  DFT 

The  discrete  Fourier  transform  is,  just  like  its  continuous  counterpart, 
identical  in  both  directions.  For  a  discrete  signal  g(u)  of  length  M 
(u  =  0  ...  M  —  1),  the  forward  transform  (DFT)  is  defined  as 

structed)  at  a  relatively  low  sampling  frequency,  even  well  below  the 

maximum  signal  frequency.  This  is  possible  because  one  can  also  use  a 

filter  with  suitably  low  bandwidth  for  reconstructing  the  original  signal. 

For  example,  it  may  be  sufficient  to  strike  (i.e.,  “sample”)  a  church  bell 

(a  low-bandwidth  oscillatory  system  with  small  internal  damping)  to 

uniquely  generate  a  sound  wave  of  relatively  high  frequency. 

9  Notation:  We  use  g(x),  G(oj)  for  a  continuous  signal  or  spectrum,  re¬ 
spectively,  and  g(u),  G{m)  for  the  discrete  versions. 


18.3  The  Discrete 
Fourier  Transform 
(DFT) 


469 


18  Introduction  to 
Spectral  Techniques 

Fig.  18.9 

Transition  from  continuous 
to  discrete  periodic  func¬ 
tions  (illustration  only). 
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(e)  Discrete  nonperiodic  signal  with 
samples  spaced  at  ts 
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(f)  Continuous  periodic  spectrum  with 
period  ujs  =  2tt / ts 
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(g)  Discrete  periodic  signal  with  sam¬ 
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(h)  Discrete  periodic  spectrum  with 

values  spaced  at  c o0  =  2iv/t0  and 
period  cos=27r/ts=uJ0M 
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for  0  <  m  <  M,  and  the  inverse  transform  (DFT"  _1)  is10 
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(18.47) 


for  0  <  u  <  M.  Note  that  both  the  signal  g(u )  and  the  discrete 
spectrum  G{m)  are  complex-valued  vectors  of  length  M ,  that  is, 


g(u)  =  9Re(u)  + 

G(m)  =  Gfte(m)  +  i-Gjm(m), 


(18.48) 


for  u,m  =  0, . . . ,  M—l.  A  numerical  example  for  a  DFT  with  M  =  10 
is  shown  in  Fig.  18.10.  Converting  Eqn.  (18.44)  from  Euler’s  exponen¬ 
tial  notation  (Eqn.  (18.10))  we  obtain  the  discrete  Fourier  spectrum 
in  component  notation  as 
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(18.49) 


where  we  denote  as  and  the  discrete  (cosine  and  sine)  basis 
functions,  as  described  in  the  next  section.  Applying  the  usual  com¬ 
plex  multiplication,* 11  we  obtain  the  real  and  imaginary  parts  of  the 
discrete  Fourier  spectrum  as 


M  —  l 


^  1V1  —  JL 

GKe(m)  =  -J=  ■  Y  9Re(u)  ■  C%(u)  +  gim(u)  ■  SZ{u), 


u— 0 
M  —  l 


1V1  —  JL 

Glm(m)  =  -=  ■  Y  9lm(u)  ■  Cm  («)  -  gRe(u)  '  £„(«)> 

v  u— 0 

for  m  =  0, . . . ,  M  —  1.  Analogously,  the  inverse  DFT  in  Eqn. 
expands  to 


(18.50) 

(18.51) 
(18.46) 


M  —  l 

SRe(«)  =  ■  Y  GMm)  ■  CZ{m)  -  Glm(m)  ■  (m),  (18.52) 

v  m= 0 

^  M—l 

9im(u)  =  ~ j =  ■  Y  Gim(m)  ■  Cff  (m)  +  GRe(m)  •  (m),  (18.53) 

V  171  =  0 

for  u  =  0, . . . ,  M  —  1. 

10  Compare  these  definitions  with  the  corresponding  expressions  for  the 
continuous  forward  and  inverse  Fourier  transforms  in  Eqns.  (18.19)  and 
(18.20),  respectively. 

11  See  also  Sec.  A. 3  in  the  Appendix. 
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18.3.2  Discrete  Basis  Functions 

The  inverse  DFT  (Eqn.  (18.46))  performs  the  decomposition  of  the 
discrete  function  g(u)  into  a  finite  sum  of  M  discrete  cosine  and  sine 
functions  (C^f ,  )  whose  weights  (or  “amplitudes”)  are  determined 
by  the  DFT  coefficients  in  G(m).  Each  of  these  ID  basis  functions 
(first  used  in  Eqn.  (18.49)), 

C%(u)  =  Cff  (to)  =  cos(2tt^),  (18.54) 

S%(u)  =  S?(m)  =  sin(27r^),  (18.55) 

is  periodic  with  M  and  has  a  discrete  frequency  (wave  number)  m, 
which  corresponds  to  the  angular  frequency 

(18.56) 

For  example,  Figs.  18.11  and  18.12  show  the  discrete  basis  functions 
(with  integer  ordinate  values  u  G  Z)  for  the  DFT  of  length  M  =  8  as 
well  as  their  continuous  counterparts  (with  ordinate  values  xGl). 

For  wave  number  m  =  0,  the  cosine  function  Cq1  (u)  (Eqn. 
(18.54))  has  the  constant  value  1.  The  corresponding  DFT  coeffi¬ 
cient  GRe(0) — the  real  part  of  G(0) — thus  specifies  the  constant  part 
of  the  signal  or  the  average  value  of  the  signal  g(u)  in  Eqn.  (18.52). 
In  contrast,  the  zero- frequency  sine  function  (u)  is  zero  for  any 
value  of  u  and  thus  cannot  contribute  anything  to  the  signal.  The 
corresponding  DFT  coefficients  GIm(0)  in  Eqn.  (18.52)  and  GRe(0) 
in  Eqn.  (18.53)  are  therefore  of  no  relevance.  For  a  real- valued  signal 
(i.e. ,  gim(u)  =  0  for  all  it),  the  coefficient  GIm(0)  in  the  corresponding 
Fourier  spectrum  must  also  be  zero. 

As  seen  in  Fig.  18.11,  the  wave  number  m  =  1  relates  to  a  cosine 
or  sine  function  that  performs  exactly  one  full  cycle  over  the  signal 
length  M  =  8.  Similarly,  the  wave  numbers  m  —  2, . . . ,  7  correspond 
to  2, . . . ,  7  complete  cycles  over  the  signal  length  M  (see  Figs.  18.11 
and  18.12). 


iOm  2  7 r 


rn 

M 


18.3.3  Aliasing  Again! 

A  closer  look  at  Figs.  18.11  and  18.12  reveals  an  interesting  fact:  the 
sampled  (discrete)  cosine  and  sine  functions  for  m  =  3  and  m  =  5  are 
identical ,  although  their  continuous  counterparts  are  different!  The 
same  is  true  for  the  frequency  pairs  m  =  2,6  and  m  —  1,7.  What  we 
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see  here  is  another  manifestation  of  the  sampling  theorem — which  we 
had  originally  encountered  (Sec.  18.2.1)  in  frequency  space — in  signal 
space.  Obviously,  m  =  4  is  the  maximum  frequency  component  that 
can  be  represented  by  a  discrete  signal  of  length  M  =  8.  Any  discrete 
function  with  a  higher  frequency  (m  =  5, . . . ,  7  in  this  case)  has  an 
identical  counterpart  with  a  lower  wave  number  and  thus  cannot  be 
reconstructed  from  the  sampled  signal  (see  also  Fig.  18.13)! 

If  a  continuous  signal  is  sampled  at  a  regular  distance  r,  the  cor¬ 
responding  Fourier  spectrum  is  repeated  at  multiples  of  uos  =  2i r/r, 


18.3  The  Discrete 
Fourier  Transform 
(DFT) 

Fig.  18.11 

Discrete  basis  functions 
C^m(u)  and  Sm{.u)  f°r  the 
signal  length  M  —  8  and  wave 
numbers  m  =  0,  .  .  .  ,  3.  Each 
plot  shows  both  the  discrete 
function  (round  dots)  and  the 
corresponding  continuous  func¬ 
tion. 
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Fig.  18.12 

Discrete  basis  functions 
(continued).  Signal  length 
M  —  8  and  wave  numbers 
m  =  4,  .  .  .  ,  7.  Notice  that, 
for  example,  the  discrete  func¬ 
tions  for  rri  =  5  and  m  =  3 
(Fig.  18.11)  are  identical  be¬ 
cause  m  —  4  is  the  maxi¬ 
mum  wave  number  that  can 
be  represented  in  a  discrete 
spectrum  of  length  M  =  8. 


Cfn(u)  =  cos  (^^u) 


SmO)  =  sin 


C85(u) 


C86(u) 
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as  we  have  shown  earlier  (Fig.  18.8).  In  the  discrete  case,  the  spec¬ 
trum  is  periodic  with  length  M.  Since  the  Fourier  spectrum  of  a 
real- valued  signal  is  symmetric  about  the  origin  (Eqn.  (18.21)),  there 
is  for  every  coefficient  with  wave  number  m  an  equal-sized  dupli¬ 
cate  with  wave  number  —  m.  Thus  the  spectral  components  appear 
pairwise  and  mirrored  at  multiples  of  M;  that  is, 


18.3  The  Discrete 
Fourier  Transform 
(DFT) 

Fig.  18.13 

Aliasing  in  signal  space.  For 
the  signal  length  M  —  8,  the 
discrete  cosine  and  sine  basis 
functions  for  the  wave  numbers 
m  —  1,9,  IT,  .  .  .  (round  dots) 
are  all  identical.  The  sampling 
frequency  itself  corresponds  to 
the  wave  number  m  =  8. 


m  —  9 


u 


G(m) 


G(M-m)  I 
G(2M  —  m) 


G(M+m)\ 
G(2M +ra) 


(18.57) 


G(kM-m ) 


G{kM+m)  |, 


for  all  k  G  Z.  If  the  original  continuous  signal  contains  “energy”  at 
the  frequencies 

UJm  >  ^M/2 

(i.e.,  signal  components  with  wave  numbers  m  >  M/2),  then,  accord¬ 
ing  to  the  sampling  theorem,  the  overlapping  parts  of  the  spectra  are 
superimposed  in  the  resulting  periodic  spectrum  of  the  discrete  sig¬ 
nal. 


18.3.4  Units  in  Signal  and  Frequency  Space 

The  relation  between  the  units  in  signal  and  frequency  space  and  the 
interpretation  of  wave  numbers  m  is  a  common  cause  of  confusion. 
While  the  discrete  signal  and  its  spectrum  are  simple  numerical  vec¬ 
tors  and  units  of  measurement  are  irrelevant  for  computing  the  DFT 
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itself,  it  is  nevertheless  important  to  understand  how  the  coordinates 
in  the  spectrum  relate  to  physical  dimensions  in  the  real  world. 

Clearly,  every  complex-valued  spectral  coefficient  G(m)  corre¬ 
sponds  to  one  pair  of  cosine  and  sine  functions  with  a  particular 
frequency  in  signal  space.  Assume  a  continuous  signal  is  sampled  at 
M  consecutive  positions  spaced  at  r  (an  interval  in  time  or  distance 
in  space).  The  wave  number  m  =  1  then  corresponds  to  the  fun¬ 
damental  period  of  the  discrete  signal  (which  is  now  assumed  to  be 
periodic)  with  a  period  of  length  Mr;  that  is,  to  the  frequency 

A  =  T  ■  (18-58) 


In  general,  the  wave  number  m  of  a  discrete  spectrum  relates  to  the 
physical  frequency  as 


f m 


1 


rri 


Mr 


171  •  /i 


(18.59) 


for  0  <  m  <  M,  which  is  equivalent  to  the  angular  frequency 

2n 


UJ 


m 


2tT fm 


m 


Mr 


m  •  u)1. 


(18.60) 


Obviously  then,  the  sampling  frequency  fs  =  1/r  =  M  •  f1  corre¬ 
sponds  to  the  wave  number  ms  =  M.  As  expected,  the  maximum 
nonaliased  wave  number  in  the  spectrum  is 


M  rns 

^max  2  2  5 

that  is,  half  the  sampling  frequency  index  ms. 


(18.61) 


Example  1:  time-domain  signal 

We  assume  for  this  example  that  g(u)  is  a  signal  in  the  time  domain 
(e.g.,  a  discrete  sound  signal)  that  contains  M  =  500  sample  values 
taken  at  regular  intervals  r  =  1  ms  =  10  ~ 3  s.  Thus  the  sampling 
frequency  is  fs  =  1/r  =  1000  Hertz  (cycles  per  second)  and  the  total 
duration  (fundamental  period)  of  the  signal  is  Mr  =  0.5  s. 

The  signal  is  implicitly  periodic,  and  from  Eqn.  (18.58)  we  obtain 
its  fundamental  frequency  as  f1  =  5QQ.}Q-3  =  =  2  Hertz.  The 

wave  number  m  —  2  in  this  case  corresponds  to  a  real  frequency 
f2  =  2 fi  =  4  Hertz,  /3  =  6  Hertz,  etc.  The  maximum  frequency  that 
can  be  represented  by  this  discrete  signal  without  aliasing  is  /max  = 
y/i  =  27  =  500  Hertz,  exactly  half  the  sampling  frequency  fs. 

Example  2:  space-domain  signal 

Assume  we  have  a  ID  print  pattern  with  a  resolution  (i.e.,  spatial 
sampling  frequency)  of  120  dots  per  cm,  which  equals  approximately 
300  dots  per  inch  (dpi)  and  a  total  signal  length  of  M  —  1800  samples. 
This  corresponds  to  a  spatial  sampling  interval  of  r  —  1/120  cm  « 
83  /iin  and  a  physical  signal  length  of  (1800/120)  cm  =  15  cm. 

The  fundamental  frequency  of  this  signal  (again  implicitly  as¬ 
sumed  to  be  periodic)  is  /x  =  expressed  in  cycles  per  cm.  The 
sampling  frequency  is  f s  =  120  cycles  per  cm  and  thus  the  maximum 
signal  frequency  is  /max  =  =  60  cycles  per  cm.  The  maximum 

signal  frequency  specifies  the  finest  structure  (^  cm)  that  can  be 
reproduced  by  this  print  raster. 
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18.3.5  Power  Spectrum 

The  magnitude  of  the  complex-valued  Fourier  spectrum, 


18.4  Implementing  the 
DFT 


\G(m)\  =  \j  G\e{m)  +  G\m{m) ,  (18.62) 

is  commonly  called  the  “power  spectrum”  of  a  signal.  It  specifies 
the  energy  that  individual  frequency  components  in  the  spectrum 
contribute  to  the  signal.  The  power  spectrum  is  real- valued  and 
positive  and  thus  often  used  for  graphically  displaying  the  results  of 
Fourier  transforms  (see  also  Ch.  19,  Sec.  19.2). 

Since  all  phase  information  is  lost  in  the  power  spectrum,  the  orig¬ 
inal  signal  cannot  be  reconstructed  from  the  power  spectrum  alone. 
However,  because  of  the  missing  phase  information,  the  power  spec¬ 
trum  is  insensitive  to  shifts  of  the  original  signal  and  can  thus  be 
efficiently  used  for  comparing  signals.  To  be  more  precise,  the  power 
spectrum  of  a  circularly  shifted  signal  is  identical  to  the  power  spec¬ 
trum  of  the  original  signal.  Thus,  given  a  discrete  periodic  signal 
gi(u)  of  length  M  and  a  second  signal  g2(^)  shifted  by  some  offset  d, 
such  that 


52  (0  =  9i(u-d) 


(18.63) 


the  corresponding  power  spectra  are  the  same,  that  is, 


G2(to) 


Gi(m)|, 


(18.64) 


although  in  general  the  complex- valued  spectra  G1  (m)  and  G2  (m)  are 
different.  Furthermore,  from  the  symmetry  property  of  the  Fourier 
spectrum,  it  follows  that 


G(m) 


G(—m)\ , 


for  real- valued  signals  g(u)  G  R. 


(18.65) 


18.4  Implementing  the  DFT 

18.4.1  Direct  Implementation 

Based  on  the  definitions  in  Eqns.  (18.50)  and  (18.51)  the  DFT  can 
be  directly  implemented,  as  shown  in  Prog.  18.1.  The  main  method 
DFT()  transforms  a  signal  vector  of  arbitrary  length  M  (not  necessar¬ 
ily  a  power  of  2).  It  requires  roughly  M2  operations  (multiplications 
and  additions);  that  is,  the  time  complexity  of  this  DFT  algorithm 
is  G(M2). 

One  way  to  improve  the  efficiency  of  the  DFT  algorithm  is  to 
use  lookup  tables  for  the  sin  and  cos  functions  (which  are  relatively 
“expensive”  to  compute)  since  only  function  values  for  a  set  of  M 
different  angles  ipm  are  ever  needed.  The  angles  ipm  =  corre¬ 

sponding  to  m  =  0 , . . . ,  M  —  1  are  evenly  distributed  over  the  full 
360°  circle.  Any  integral  multiple  Lpm  •  u  (for  u  G  Z)  can  only  fall 
onto  one  of  these  angles  again  because 
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Prog.  18.1 

Direct  implementation  of 
the  DFT  based  on  the  defi¬ 
nition  in  Eqns.  (18.50)  and 
(18.51).  The  method  DFT() 
returns  a  complex-valued  vec¬ 
tor  with  the  same  length  as 
the  complex-valued  input  (sig¬ 
nal)  vector  g.  This  method 
implements  both  the  forward 
and  the  inverse  transforms, 
controlled  by  the  Boolean  pa¬ 
rameter  forward.  The  class 
Complex  (bottom)  defines  the 
structure  of  the  complex¬ 
valued  vector  elements. 


1  class  Complex  { 

2  double  re,  im; 

3  Complex  (double  re,  double  im)  {  //constructor  method 

4  this. re  =  re; 

5  this.im  =  im; 

6  } 

7  } 


8 

Complex 

[]  DFT  (Complex  []  g,  boolean  forward) 

{ 

9 

int 

M  =  g. length; 

10 

double  s  =  1  /  Math .  sqrt  (M) ;  //common  scale  factor 

11 

Complex  []  G  =  new  Complex  [M]  ; 

12 

for 

(int  m  =  0;  m  <  M;  m++)  { 

13 

double  sumRe  =  0; 

14 

double  sumlm  =  0; 

15 

double  phim  =  2  *  Math. PI  *  m  / 

M; 

16 

for  (int  u  =  0;  u  <  M;  u++)  { 

IT 

double  gRe  =  g [u] . re ; 

18 

double  glm  =  g[u].im; 

19 

double  cosw  =  Math . cos (phim 

* 

u 

); 

20 

double  sinw  =  Math . sin (phim 

* 

u 

); 

21 

if  ( !  forward)  //  inverse  transform 

22 

sinw  =  -sinw; 

23 

//complex  multiplication:  [gRe+i  •  gim. 

• 

cos(cu)+i-sin(cj)] 

24 

sumRe  +=  gRe  *  cosw  +  glm  * 

sinw; 

25 

sumlm  +=  glm  *  cosw  -  gRe  * 

sinw; 

26 

} 

27 

G[m]  =  new  Complex (s  *  sumRe,  s 

* 

sumlm) ; 

28 

} 

29 

return  G; 

30 

} 

(pm-u  =  2n^  =  jj-  ■  (mu  mod  M)  =  =  <pk,  (18.66) 

0<k<M 

where  mod  denotes  the  “modulus”  operator.12  Thus  we  can  set  up 
two  constant  tables  (floating-point  arrays)  \NC  and  \NS  of  size  M 
with  the  values 

\Nc(k)  <—  co s(cdfc)  =  cos  (27t^)  ,  (18.67) 

W s(k)  <—  sin (ujk)  =  sin  (271-^)  ,  (18.68) 

for  0  <  k  <  M.  For  computing  the  DFT,  the  necessary  cosine  and 
sine  values  (Eqn.  (18.49))  can  be  read  from  these  tables  as 

(u)  =  cos(27 r^)=  \Nc(mu  mod  M),  (18.69) 

Sjf  (u)  =  sin (27 r^)  =  \Ns(mu  mod  M),  (18.70) 

for  arbitrary  values  of  ra,  u  G  Z,  without  any  additional  computation. 
The  necessary  modification  of  the  DFT()  method  in  Prog.  18.1  is  left 
as  an  exercise  (Exercise  18.5). 

Despite  this  significant  improvement,  the  direct  implementation 
of  the  DFT  remains  computationally  intensive.  As  a  matter  of  fact, 
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12 


See  also  Sec.  F.l. 2  in  the  Appendix. 


it  has  been  impossible  for  a  long  time  to  compute  this  form  of  DFT  ^§  5  Exercises 
in  sufficiently  short  time  on  off-the-shelf  computers,  and  this  is  still 
true  today  for  many  real  applications. 

18.4.2  Fast  Fourier  Transform  (FFT) 

Fortunately,  for  computing  the  DFT  in  practice,  fast  algorithms  ex¬ 
ist  that  lay  out  the  sequence  of  computations  in  such  a  way  that 
intermediate  results  are  only  computed  once  and  optimally  reused 
many  times.  This  “fast  Fourier  transform”,  which  exists  in  many 
variations,  generally  reduces  the  time  complexity  of  the  computa¬ 
tion  from  0(M2)  to  0(M  log2  M).  The  benefits  are  substantial,  in 
particular  for  longer  signals.  For  example,  with  a  signal  of  length 
M  =  103,  the  FFT  leads  to  a  speedup  by  a  factor  of  100  over  the 
direct  DFT  implementation  and  an  impressive  gain  of  10,000  times 
for  a  signal  of  length  M  =  106.  Since  its  invention,  the  FFT  has 
therefore  become  an  indispensable  tool  in  almost  any  application  of 
spectral  signal  analysis  [34]. 

Most  FFT  algorithms,  including  the  one  described  in  the  famous 
publication  by  Cooley  and  Tukey  in  1965  (see  [88,  p.  156]  for  a  historic 
overview),  are  designed  for  signals  of  length  M  =  2k  (i.e.,  powers  of 
2).  However,  FFT  algorithms  have  also  been  developed  for  other 
lengths,  including  several  small  prime  numbers  [25].  Efficient  Java 
implementations  are  available,  for  example,  as  part  of  the  J Transform 
library13  by  Piotr  Wendykier  [255]  or  the  Apache  Commons  Math 
libary.14 

It  is  important  to  remember,  though,  that  the  DFT  and  FFT  com¬ 
pute  exactly  the  same  result  and  the  FFT  is  only  a  special — though 
ingenious — method  for  implementing  the  discrete  Fourier  transform 
(Eqn.  (18.44)). 


18.5  Exercises 

Exercise  18.1.  Calculate  the  values  of  the  cosine  function  f(x)  = 
cos  (cox)  with  angular  frequency  uj  =  5  for  the  positions  x  =  —3,  —2, 
. . . ,  2,  3.  What  is  the  length  of  this  function’s  period? 

Exercise  18.2.  Determine  the  phase  angle  p  of  the  function  f(x)  = 
A  •  cos(c ox)  +  B  •  sin (ujx)  for  A  =  —  1  and  B  —  2. 

Exercise  18.3.  Calculate  the  real  part,  the  imaginary  part,  and  the 
magnitude  of  the  complex  value  2  =  E5  •  e_l2  5. 

Exercise  18.4.  A  ID  optical  scanner  for  sampling  film  transparen¬ 
cies  is  supposed  to  resolve  image  structures  with  a  precision  of  4,000 
dpi.  What  spatial  distance  (in  mm)  between  samples  is  required  such 
that  no  aliasing  occurs? 

Exercise  18.5.  Modify  the  direct  implementation  of  the  ID  DFT 
given  in  Prog.  18.1  by  using  lookup  tables  for  the  cos  and  sin  functions 
as  described  in  Eqns.  (18.69)-(18.70). 

13  http://sites.google.com/site/piotrwendykier/software/jtransforms. 

14  http://commons.apache.org/math/  (class  FastFourierTransf ormer). 
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The  Discrete  Fourier  Transform  in  2D 


The  Fourier  transform  is  defined  not  only  for  ID  signals  but  for  func¬ 
tions  of  arbitrary  dimension.  Thus,  2D  images  are  nothing  special 
from  a  mathematical  point  of  view. 


19.1  Definition  of  the  2D  DFT 

For  a  2D,  periodic  function  (e.g.,  an  intensity  image)  g(u,v)  of  size 
M  x  TV,  the  discrete  Fourier  transform  (2D  DFT)  is  defined  as 


G(m,  n) 


1 


M—l  N- 1 


X  X5^) 


-\2tt- 


VMN  M  M 

u— 0  u=0 


1 


M  —  l  N-l 


X  X^e 


tiu  _ •  o  n  v 

M  .  g  iZ7r  N 


•  o ^  f  mu  i  nv  \ 

■l27r(ivr+  at) 


Vmn  M  M 

u— 0  v=0 


(19.1) 


(19.2) 


for  the  spectral  coordinates  m  =  0, . . . ,  M  —  1  and  n  =  0, . . . ,  TV  —  1. 
As  we  see,  the  resulting  Fourier  transform  is  again  a  2D  function  of 
the  same  size  (M  x  N)  as  the  original  signal.  Similarly,  the  inverse 
2D  DFT  is  defined  as 


g(u,v) 


1 


M  —  l  N-l 


Vmn 

i 

^fMN 


X  X  GMn) 


A2ir 


i2v 


m— 0 n— 0 
M—l  1V-1 


X  X  G(m-n) 


g  M  .  g  JV 


—  (  mu  i  nv  \ 


(19.3) 


(19.4) 


m— 0  n=0 


for  the  image  coordinates  u  =  0, . . . ,  M  —  1  and  v  =  0, . . . ,  N  —  1. 


19.1.1  2D  Basis  Functions 


Equation  (19.4)  shows  that  a  discrete  2D,  periodic  function  g(u,v) 
can  be  represented  as  a  linear  combination  (i.e.,  as  a  weighted  sum) 
of  2D  sinusoids  of  the  form 
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•27 r(  ■ 


M 


W)  —  D'^}mU+^nV) 


=  COS 


,  mu  nv\ 

2Wm  +  «) 


+  i-sin 


,  mu  nv\ 

2w  ~m+~n)i 


"V*- 


^  m,n 


(u,v) 


qM,N 
^  m,n 


(u,v) 


(19.5) 

(19.6) 


C'm  n  (  "• v)  and  S„ f’„  (w,  w)  are  discrete,  2D  cosine  and  sine  functions 
with  horizontal  and  vertical  wave  numbers  n  and  m,  respectively,  and 
the  corresponding  angular  frequencies  c jm,  can,  that  is, 


Cm,n(u,v)  =  COS 
Sm,n(u,v)  =  sin 


,  mti  nv\ 

2”[~m+~n) 

,  mu  nv\ 

2*'m  +  «) 


=  cos(cemR  +  canu),  (19-7) 
=  sin  (u)  mu  +  u:nv).  (19.8) 


Each  of  these  basis  functions  is  periodic  with  M  units  in  the  hori¬ 
zontal  direction  and  N  units  in  the  vertical  direction. 


Examples 

Figures  19.1  and  19.2  show  a  set  of  2D  cosine  functions  of 

size  M  x  N  =  16x  16  for  various  combinations  of  wave  numbers 
m,  n  =  0, . . . ,  3.  As  we  can  clearly  see,  these  functions  correspond  to 
a  directed,  cosine-shaped  waveform  whose  orientation  is  determined 
by  the  wave  numbers  m  and  n.  For  example,  the  wave  numbers 
m  =  n  —  2  specify  a  cosine  function  (ui  v )  that  performs  two 

full  cycles  in  both  the  horizontal  and  vertical  directions,  thus  creating 
a  diagonally  oriented,  2D  wave.  Of  course,  the  same  holds  for  the 
corresponding  sine  functions. 


19.1.2  Implementing  the  2D  DFT 

As  in  the  ID  case,  we  could  directly  use  the  definition  in  Eqn.  (19.2)  to 
write  a  program  or  procedure  that  implements  the  2D  DFT.  However, 
this  is  not  even  necessary.  A  minor  rearrangement  of  Eqn.  (19.2)  into 


JV-1  r  i  M—l 


G(m,  n)  =  — =  g{u,v)  •  e  l2n  M 

n  >/M  ^ 
v=0  L  u— 0 


yn 

>  1^7r  N 


(19.9) 


1-dim.  DFT  of  row  g(-,v) 


shows  that  its  core  contains  a  ID  DFT  (see  Eqn.  (18.44))  of  the  uth 
row  vector  g(-,v)  that  is  independent  of  the  “vertical”  position  v  and 
size  N  (noting  the  fact  that  v  and  N  are  placed  outside  the  square 
brackets  in  Eqn.  (19.9)).  If,  in  a  first  step,  we  replace  each  row  vector 
g(^v)  of  the  original  image  by  its  ID  Fourier  transform, 

<7x(u  v)  DFT(#(-,  v))  for  0  <  v  <  N,  (19.10) 

then  we  only  need  to  replace  each  resulting  column  vector  by  its  ID 
DFT  in  a  second  step: 

gxy(iq  *)  DFT(#x(iq  •))  for  0  <  u  <  M.  (19.11) 
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The  resulting  function  g"(u,v)  is  precisely  the  2D  Fourier  transform 
G(m,  n).  Thus  the  2D  DFT  can  be  separated  into  a  sequence  of  ID 


m  =  0 


rn  =  1 


19.1  Definition  of  the 
2D  DFT 


Fig.  19.1 

2D  cosine  functions. 

C'mln  («,  V)  = 


COS 

M  = 
3,  m 


(  mu  i  n d  \ 

27T  {  —  +  —) 

-  N  =  16,  n  : 

=  0,1. 


for 
0,  . 


5 


DFTs  over  the  row  and  column  vectors,  respectively,  as  summarized 
in  Alg.  19.1.  The  advantage  of  this  is  twofold:  first,  the  2D-DFT  can 
be  implemented  more  efficiently,  and  second,  only  a  ID  implementa¬ 
tion  of  the  DFT  (or  the  ID  FFT,  as  described  in  Ch.  18,  Sec.  18.4.2) 
is  needed  to  implement  any  multidimensional  DFT. 

As  we  can  see  from  Eqn.  (19.9),  the  2D  DFT  could  equally  be 
performed  in  the  opposite  way,  that  is,  by  first  doing  a  ID  DFT  on 
all  rows  and  subsequently  on  all  columns.  One  should  also  note  that 
all  operations  in  Alg.  19.1  are  done  “in  place”,  which  means  that 
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m  —  2 


rri  —  3 


Fig.  19.2 

2D  cosine  functions  ( con¬ 
tinued ).  = 

cos  [2tt  for  M  = 


N 


16,  n 


0,  .  .  .  ,  3,  m 


2,3. 


the  original  signal  g(u,v)  is  destructively  modified  and  successively 
replaced  by  its  Fourier  transform  G(m,  n)  of  the  same  size,  without 
allocating  any  additional  storage  space.  This  feature  is  certainly 
desirable  and  also  quite  common,  based  on  the  fact  that  most  ID 
FFT  algorithms  (which  should  be  used  for  implementing  the  DFT  in 
practice)  work  “in  place”. 
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Separable2dDft(g)  t>  g(u,v)  £  C 

Input:  g ,  a  2D,  discrete  signal  of  size  M  x  TV,  with  g(u,v )  £ 
C.  Returns  the  DFT  for  the  2D  function  g(u,v).  The  resulting 
spectrum  G(ra,n)  has  the  same  dimensions  as  g.  The  algorithm 
works  “in  place”,  i.e.,  g  is  modified. 

(M,  N)  <r-  Siz e(g) 


for  v  <—  0, . . . ,  N  —  1  do 
-f-  DFT(r) 


>  extract  the  nth  row  vector  of  g 

>  replace  the  nth  row  vector  of  g 


for  u  <—  0, . . . ,  M  —  1  do 

c  <—  g(u,  •)  >  extract  the  uth  column  vector  of  g 

g(u,  ■)  «-  DFT(c)  >  replace  the  uth  column  vector  of  g 

Remark:  g(u,v )  =  G(ra,  n )  now  contains  the  discrete  2D  Fourier 
spectrum. 


9:  return  g 


19.2  Visualizing  the  2D 
Fourier  Transform 

Alg.  19.1 

In-place  computation  of  the 
2D  DFT  as  a  sequence  of  ID 
DFTs  on  row  and  column  vec¬ 
tors. 


19.2  Visualizing  the  2D  Fourier  Transform 


Unfortunately,  there  is  no  simple  method  for  visualizing  2D  complex¬ 
valued  functions,  such  as  the  result  of  a  2D  DFT.  One  alternative  is 
to  display  the  real  and  imaginary  parts  individually  as  2D  surfaces. 
Mostly,  however,  the  absolute  value  of  the  complex  functions  is  dis¬ 
played,  which  in  the  case  of  the  Fourier  transform  is  |G(m,  n)|,  the 
power  spectrum  (see  Ch.  18,  Sec.  18.3.5). 


19.2.1  Range  of  Spectral  Values 

For  most  natural  images,  the  “spectral  energy”  concentrates  at  the 
lower  frequencies  with  a  clear  maximum  at  wave  numbers  (0,  0);  that 
is,  at  the  co-ordinate  center  (see  also  Sec.  19.4).  The  values  of  the 
power  spectrum  usually  cover  a  wide  range,  and  displaying  them 
linearly  often  makes  the  smaller  values  invisible.  To  show  the  full 
range  of  spectral  values,  in  particular  the  smaller  values  for  the  high 
frequencies,  it  is  common  to  display  the  square  root  or  the  logarithm 
of  the  power  spectrum,  yf\G{m^n)\  or  log  |G(ra,  n)|,  respectively. 

19.2.2  Centered  Representation  of  the  DFT  Spectrum 

Analogous  to  the  ID  case,  the  2D  spectrum  is  a  periodic  function  in 
both  dimensions, 


G(m,n)  =  G(m  +  pM,n  +  qN),  (19.12) 

for  arbitrary  p,  q  £  Z.  In  the  case  of  a  real- valued  2D  signal  g(u,  v )  £ 
R  (see  Eqn.  (18.57)),  the  power  spectrum  is  also  symmetric  about 
the  origin,  that  is, 


|G(ra,  n)|  =  |G(— m,  —  n) 


(19.13) 


It  is  thus  common  to  use  a  centered  representation  of  the  spectrum 
with  coordinates  m,  n  in  the  ranges 


M 


<  m  < 


L  2  J  —  "v  —  L  2  J 


M—l 


and 


N 


<  n  < 


L2J  —  L  2  J 


TV— 1 
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Fig.  19.3 

Centering  the  2D  Fourier  spec¬ 
trum.  In  the  original  (noncen- 
tered)  spectrum,  the  coordi¬ 
nate  center  (i.e.,  the  region  of 
low  frequencies)  is  located  in 
the  upper  left  corner  and,  due 
to  the  periodicity  of  the  spec¬ 
trum,  also  at  all  other  corners 
(a).  In  this  case,  the  coeffi¬ 
cients  for  the  highest  wave 
numbers  (frequencies)  lie  at 
the  center.  Swapping  the  quad¬ 
rants  pairwise,  as  shown  in  (b), 
moves  all  low-frequency  coef¬ 
ficients  to  the  center  and  high 
frequencies  to  the  periphery. 
A  real  2D  power  spectrum  is 
shown  in  its  original  form  in 
(c)  and  in  centered  form  in  (d). 


Fig.  19.4 

Intensity  plot  of  a  2D  power 
spectrum:  original  image  (a), 
noncentered  spectrum  (b), 
and  centered  spectrum  (c). 


respectively.  This  can  be  easily  accomplished  by  swapping  the  four 
quadrants  of  the  Fourier  transform,  as  illustrated  in  Fig.  19.3.  In  the 
resulting  representation,  the  low-frequency  coefficients  are  found  at 
the  center  and  the  high-frequency  entries  along  the  outer  boundaries. 
Figure  19.4  shows  the  plot  of  a  2D  power  spectrum  as  an  intensity  im¬ 
age  in  its  original  and  centered  form,  with  the  intensity  proportional 
to  the  logarithm  of  the  spectral  values  (log10  |G(m,n)|). 


19.3  Frequencies  and  Orientation  in  2D 

19.3.1  Effective  Frequency 

As  we  could  see  in  Figs.  19.1  and  19.2,  each  2D  basis  function  is 
an  oriented  cosine  or  sine  function  whose  orientation  and  frequency 
are  determined  by  its  wave  numbers  m  and  n  for  the  horizontal  and 
vertical  directions,  respectively.  If  we  moved  along  the  main  direction 
of  such  a  basis  function  (i.e.,  perpendicular  to  the  crest  of  the  waves), 
we  would  follow  a  ID  cosine  or  sine  function  of  some  frequency  /, 


M 


9 O,  v) 
(a)  Image 


G(m ,  n) 
(b)  Spectrum 


which  we  call  the  directional  or  effective  frequency  of  the  waveform 
(see  Fig.  19.5). 

Recall  that  the  wave  numbers  ra,  n  specify  how  many  full  cycles 
the  associated  2D  basis  function  performs  over  a  distance  of  M  units 
in  the  horizontal  direction  or  N  units  in  the  vertical  direction.  Thus, 
if  an  image  of  size  M  x  N  contains  a  periodic  pattern  with  effec- 
tive  frequency  /  =  1/f  and  orientation  the  associated  frequency 
coefficients  are  found  at  positions 


19.3  Frequencies  and 
Orientation  in  2D 


Fig.  19.5 

Frequency  and  orientation  in 
2D.  The  image  (a)  contains 
a  periodic  pattern  with  effec- 
tive  frequency  /  =  1/f  and 
orientation  'ip.  The  frequency 
coefficient  corresponding  to 
this  pattern  is  found  at  posi- 

A 

tion  (m,  n)  =  ±/ •  (Mcosp, 
N  sin  ip)  (see  Eqn.  (19.14))  in 
the  2D  Fourier  spectrum  (b). 
Thus,  if  M  /  N,  the  spectral 
coefficients  (m,  n)  are  located 
at  a  direction  ( 6 )  different  to 
the  orientation  of  the  image 
pattern  (p). 


?  f M- cos(ip)\ 

J  yTV-sin(^)  J 


(19.14) 


in  the  corresponding  2D  Fourier  spectrum  (see  Fig.  19.5).  Given  the 
spectral  position  (ra,  n),  the  effective  frequency  along  the  main  direc¬ 
tion  of  the  wave  can  be  derived  (from  the  ID  case  in  Eqn.  (18.58))  as 


W)  =  ;V(  f)2  +  (#F  (19-15) 

where  we  assume  the  same  spatial  sampling  interval  along  the  x  and 
y  axes  (i.e.,  r  =  rx  =  ry).  Thus  the  maximum  signal  frequency  in 
the  directions  of  the  coordinate  axes  is 

/(±f,0)  = /(0,±f )  =  =  h  =  (19-16) 

where  fs  =  ^  denotes  the  sampling  frequency.  Notice  that  the  effec¬ 
tive  signal  frequency  at  the  corners  of  the  spectrum  is 


/(±f,±f) 


(19.17) 


which  is  a  factor  y/2  higher  than  along  the  coordinate  axes  (Eqn. 
(19.16)). 


19.3.2  Frequency  Limits  and  Aliasing  in  2D 

Figure  19.6  illustrates  the  relationship  described  in  Eqns.  (19.16)  and 
(19.17).  The  highest  permissible  signal  frequencies  in  any  direction 
he  along  the  boundary  of  the  centered  2D  spectrum  of  size  M  x 
N.  Any  signal  with  all  frequency  components  within  this  region 
complies  with  the  sampling  theorem  (Nyquist  rule)  and  can  thus  be 
reconstructed  without  aliasing.  In  contrast,  any  spectral  component 
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Fig.  19.6 

Maximum  signal  frequencies 
and  aliasing  in  2D.  The  bound¬ 
ary  of  the  M  X  N  2D  spectrum 
(inner  rectangle)  marks  the 
region  of  permissible  signal  fre¬ 
quencies  along  any  direction. 
The  outer  rectangle  corre¬ 
sponds  to  the  effective  sam¬ 
pling  frequency,  which  is  twice 
the  maximum  signal  frequency 
in  the  same  direction.  The  sig¬ 
nal  component  at  spectral  po¬ 
sition  a  lies  inside  the  permis¬ 
sible  frequency  range  and  thus 
causes  no  alasing.  In  contrast, 
component  b  is  outside  the 
permissible  range.  Due  to  the 
periodicity  of  the  spectrum,  all 
components  repeat  (as  in  the 
ID  case)  at  all  multiples  of  the 
sampling  frequency  along  the 
m  and  n  axis.  This  causes  the 
component  B  to  be  “aliased” 
to  a  lower-frequency  position 
B'  (and  B  to  B')  in  the  vis¬ 
ible  part  of  the  spectrum. 
Note  that  this  also  changes 
the  direction  of  the  corre¬ 
sponding  wave  in  signal  space. 


outside  these  limits  is  reflected  across  the  boundary  of  this  box  toward 
the  coordinate  center  onto  lower  frequencies,  which  would  appear  as 
visual  aliasing  in  the  reconstructed  image. 

Apparently  the  lowest  effective  sampling  frequency  (Eqn.  (19.15)) 
occurs  in  the  directions  of  the  coordinate  axes  of  the  sampling  grid. 
To  ensure  that  a  certain  image  pattern  can  be  reconstructed  without 
aliasing  at  any  orientation,  the  effective  signal  frequency  /  of  that 
pattern  must  be  limited  to  ^  in  every  direction,  again  assuming 
that  the  sampling  interval  r  is  the  same  along  both  coordinate  axes. 


19.3.3  Orientation 

The  spatial  orientation  of  a  2D  cosine  or  sine  wave  with  spectral 
coordinates  m,  n  (wave  numbers  0<m<M,0<n<  N)  is 

V^(m,n)  =  ArcTan ( ^ ,  -^ , )  =  ArcTan(ra7V,  niff),  (19.18) 

where  Vvm,n)  for  m  =  n  =  0  is  of  course  undefined.1  Conversely,  a 

2D  sinusoid  with  effective  frequency  /  and  spatial  orientation  ip  is 
represented  by  the  spectral  coordinates 

(m,  n)  =  d=/-  (Mcos^,  N  sinpj) ,  (19.19) 

as  already  shown  in  Fig.  19.5. 


19.3.4  Normalizing  the  Geometry  of  the  2D  Spectrum 

From  Eqn.  (19.19)  we  can  derive  that  in  the  special  case  of  a  si¬ 
nusoid  with  spatial  orientation  tp  =  45°  the  corresponding  spectral 
coefficients  are  found  at  the  frequency  coordinates 

1  ArcTan (x,y)  in  Eqn.  (19.18)  denotes  the  inverse  tangent  function 
tan ~1(y/x)  (also  see  Sec.  F.l. 6  in  the  Appendix). 
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(a) 


(b) 


(c) 


Fig.  19.7 

Normalizing  the  2D  spectrum. 
Original  image  (a)  with  dom¬ 
inant  oriented  patterns  that 
show  up  as  clear  peaks  in 
the  corresponding  spectrum 
(b).  Because  the  image  and 
the  spectrum  are  not  square 
(M  7^  AT),  orientations  in  the 
image  are  not  the  same  as  in 
the  actual  spectrum  (b).  After 
the  spectrum  is  normalized  to 
square  proportions  (c),  we  can 
clearly  observe  that  the  cylin¬ 
ders  of  this  (Harley-Davidson 
V-Rod )  engine  are  really  ar¬ 
ranged  at  a  60°  angle. 


(ra,  n)  =  ±(AM,  \N)  for  —  \  <  A  <  +  |,  (19.20) 

that  is,  at  the  diagonals  of  the  spectrum  (see  also  Eqn.  (19.17)). 
Unless  the  image  (and  thus  the  spectrum)  is  quadratic  [M  —  TV), 
the  angle  of  orientation  in  the  image  and  in  the  spectrum  are  not  the 
same,  though  they  coincide  along  the  directions  of  the  coordinate 
axes.  This  means  that  rotating  an  image  by  some  angle  a  does  turn 
the  spectrum  in  the  same  direction  but  in  general  not  by  the  same 
angle  a\ 

To  obtain  identical  orientations  and  turning  angles  in  both  the 
image  and  the  spectrum,  it  is  sufficient  to  scale  the  spectrum  to 
square  size  such  that  the  spectral  resolution  is  the  same  along  both 
frequency  axes  (as  shown  in  Fig.  19.7). 


19.3.5  Effects  of  Periodicity 

When  interpreting  the  2D  DFT  of  images,  one  must  always  be  aware 
of  the  fact  that  with  any  discrete  Fourier  transform,  the  signal  func¬ 
tion  is  implicitly  assumed  to  be  periodic  in  every  dimension.  Thus 
the  transitions  at  the  borders  between  the  replicas  of  the  image  are 
also  part  of  the  signal,  just  like  the  interior  of  the  image  itself.  If  there 
is  a  large  intensity  difference  between  opposing  borders  of  an  image 
(e.g.,  between  the  upper  and  lower  parts  of  a  landscape  image),  then 
this  causes  strong  transitions  in  the  resulting  periodic  signal.  Such 
steep  discontinuities  are  of  high  bandwidth  (i.e. ,  the  corresponding 
signal  energy  is  spread  over  a  wide  range  along  the  coordinate  axes 
of  the  sampling  grid;  see  Fig.  19.8).  This  broadband  energy  distribu¬ 
tion  along  the  main  axes,  which  is  often  observed  with  real  images, 
may  lead  to  a  suppression  of  other  relevant  signal  components  in  the 
spectrum. 
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Fig.  19.8 

Effects  of  periodicity  in  the 
2D  spectrum.  The  discrete 
Fourier  transform  is  computed 
under  the  implicit  assump¬ 
tion  that  the  image  signal  is 
periodic  along  both  dimen¬ 
sions  (top).  Large  differences 
in  intensity  at  opposite  image 
borders — here  most  notably 
in  the  vertical  direction — lead 
to  broad-band  signal  compo¬ 
nents  that  in  this  case  appear 
as  a  bright  line  along  the  spec¬ 
trum’s  vertical  axis  (bottom). 


19.3.6  Windowing 

One  solution  to  this  problem  is  to  multiply  the  image  function 
g(u,v)  =  I(u,v)  by  a  suitable  windowing  function  w(u,v),  that  is, 

g(u,v)=g(u,v)-w(u,v),  (19.21) 

for  0  <  u  <  M,  0  <  v  <  N,  prior  to  computing  the  DFT.  The 
windowing  function  w(u,v)  should  drop  off  continuously  toward  the 
image  borders  such  that  the  transitions  between  image  replicas  are 
effectively  eliminated.  But  multiplying  the  image  with  w(u,v)  has 
additional  effects  upon  the  spectrum.  As  we  already  know  (from  Eqn. 
(18.26)),  a  multiplication  of  two  functions  in  signal  space  corresponds 
to  a  convolution  of  the  corresponding  spectra  in  frequency  space, 
that  is, 

G(m,n)  =  G(m,n)  *W(m,n).  (19.22) 

To  cause  the  least  possible  damage  to  the  Fourier  transform  of  the 
image,  the  ideal  spectrum  of  w(u,v)  would  be  the  impulse  function 
5(m,  n).  Unfortunately,  this  again  corresponds  to  the  constant  win¬ 
dowing  function  w(u,v)  =  1  with  no  windowing  effect  at  all.  In  gen- 
eral,  we  can  say  that  a  broader  spectrum  of  the  windowing  function 


w(u,v)  smoothes  the  resulting  spectrum  more  strongly  and  individ¬ 
ual  frequency  components  are  harder  to  isolate. 

Taking  a  picture  is  equivalent  to  cutting  out  a  finite  (usually 
rectangular)  region  from  an  infinite  image  plane,  which  can  be  sim¬ 
ply  modeled  as  a  multiplication  with  a  rectangular  pulse  function  of 
width  M  and  height  N.  So,  in  this  case,  the  spectrum  of  the  original 
intensity  function  is  convolved  with  the  spectrum  of  the  rectangular 
pulse  (box).  The  problem  is  that  the  spectrum  of  the  rectangular 
box  (see  Fig.  19.9(a))  is  of  extremely  high  bandwidth  and  thus  far 
off  the  ideal  narrow  impulse  function. 

These  two  examples  demonstrate  a  dilemma:  windowing  func¬ 
tions  should  for  one  be  as  wide  as  possible  to  include  a  maximum 
part  of  the  original  image,  and  they  should  also  drop  off  to  zero  to¬ 
ward  the  image  borders  but  then  again  not  too  steeply  to  maintain 
a  narrow  windowing  spectrum. 


19.3.7  Common  Windowing  Functions 

Suitable  windowing  functions  should  therefore  exhibit  soft  transi¬ 
tions,  and  many  variants  have  been  proposed  and  analyzed  both  the¬ 
oretically  and  for  practical  use  (see,  e.g.,  [34,  Ch.  9,  Sec.  9.3],  [194, 
Ch.  10]).  Table  19.1  lists  the  definitions  of  several  popular  window¬ 
ing  functions;  the  corresponding  2D  (logarithmic)  power  spectra  are 
displayed  in  Figs.  19.9  and  19.10. 

The  spectrum  of  the  rectangular  pulse  function  (Fig.  19.9(a)), 
which  assigns  identical  weights  to  all  image  elements,  exhibits  a  rela¬ 
tively  narrow  peak  at  the  center,  which  promises  little  smoothing  in 
the  resulting  windowed  spectrum.  Nevertheless,  the  spectral  energy 
drops  off  quite  slowly  toward  the  higher  frequencies,  thus  creating  a 
rather  wide  spectrum.  Not  surprisingly,  the  behavior  of  the  ellipti¬ 
cal  windowing  function  in  Fig.  19.9(b)  is  quite  similar.  The  Gaus¬ 
sian  window  in  Fig.  19.9(c)  demonstrates  how  the  off-center  spectral 
energy  can  be  significantly  suppressed  by  narrowing  the  windowing 
function,  however,  at  the  cost  of  a  much  wider  peak  at  the  center. 
In  fact,  none  of  the  functions  in  Fig.  19.9  makes  a  good  windowing 
function. 

Obviously,  the  choice  of  a  suitable  windowing  function  is  a  deli¬ 
cate  compromise  since  even  apparently  similar  functions  may  exhibit 
largely  different  behaviors  in  the  frequency  spectrum.  For  example, 
good  overall  results  can  be  obtained  with  the  Hanning  window  (Fig. 
19.10(c))  or  the  Parzen  window  (Fig.  19.10(d)),  which  are  both  easy 
to  compute  and  frequently  used  in  practice. 

Figure  19.11  illustrates  the  effects  of  selected  windowing  functions 
upon  the  spectrum  of  an  intensity  image.  As  can  be  seen  clearly,  nar¬ 
rowing  the  windowing  function  leads  to  a  suppression  of  the  artifacts 
caused  by  the  signal’s  implicit  periodicity.  At  the  same  time,  how¬ 
ever,  it  also  reduces  the  resolution  of  the  spectrum;  the  spectrum 
becomes  blurred,  and  individual  peaks  are  widened. 


19.3  Frequencies  and 
Orientation  in  2D 
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Table  19.1 

2D  windowing  function  defini¬ 
tions.  The  functions  w(u,v ) 
have  their  maximum  val¬ 
ues  at  the  image  center, 
w(M/2,N/2)  =  1.  The  val¬ 
ues  ru,  rv,  and  ru  used 
in  the  definitions  are  speci¬ 
fied  at  the  top  of  the  table. 


Definitions: 

—  n— M/2  —  2u  i  —  v—N/2  —  2r  i  —  /  2  1  r2 

u  M/2  M  ’  'v  N/2  N  ’  '  u,v  V  u  '  v 

Elliptical  w(u,v)  =  < 

window: 

1  for  0  <  ru<v  <  1 

0  otherwise 

Gaussian  w(u,v)  =  2<r2  <7  =  0.3, ...,  0.4 

window: 

/  -rn  \ 

(  u,v  \ 

Super-  w(u,  v)  =  K  ' ,  n  =  6,  k  —  0.3, . . . ,  0.4 

Gaussian 

window: 

Cosine2  w(u,v)  —  < 

window: 

cos(fr«)  •  cos(fr«)  for  0  <  ru ,  rv  <  1 

0  otherwise 

Bartlett  w(u,v)  =  < 

window: 

1  ~ru,v  for  0  <  rUtV  <  1 

0  otherwise 

Hanning  w(u,v)  =  < 

window: 

0.5  •  [cos(7rrttjt))  +  l]  for  0  <  rUtV  <  1 

0  otherwise 

Parzen  w(u,v)  =  < 

window: 

1  -  6rf  v  +  6 ri)V  for  0  <  rUj„  <  0.5 

1  2  •  (1  -  ru<v)3  for  0.5  <  ru<v  <  1 

0  otherwise 

19.4  2D  Fourier  Transform  Examples 

The  following  examples  demonstrate  some  basic  properties  of  the  2D 
DFT  on  real  intensity  images.  All  examples  in  Figs.  19.12-19.18 
show  a  centered  and  squared  spectrum  with  logarithmic  intensity 
values  (see  Sec.  19.2). 


Scaling 

Figure  19.12  shows  that  scaling  the  image  in  signal  space  has  the 
opposite  effect  in  frequency  space,  analogous  to  the  ID  case  (see  Ch. 
18,  Fig.  18.4). 


Periodic  Image  Patterns 
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The  images  in  Fig.  19.13  contain  repetitive  periodic  patterns  at  var¬ 
ious  orientations  and  scales.  They  appear  as  distinct  peaks  at  the 
corresponding  positions  (see  Eqn.  (19.19))  in  the  spectrum. 


19.4  2D  Fourier 
Transform  Examples 


Fig.  19.9 

Windowing  functions  and  their 
logarithmic  power  spectra. 
Rectangular  pulse  (a),  ellip¬ 
tical  window  (b),  Gaussian 
window  with  cr  —  0.3  (c),  and 
super-Gaussian  window  of  or¬ 
der  n  =  6  and  n  =  0.3  (d). 
The  windowing  functions  are 
deliberately  of  nonsquare  size 
(M  :  N  =  1  :  2). 


Rotation 

Figure  19.14  shows  that  rotating  the  image  by  some  angle  a  rotates 
the  spectrum  in  the  same  direction  and — if  the  image  is  square — by 
the  same  angle. 

Oriented,  elongated  structures 

Pictures  of  artificial  objects  often  exhibit  regular  patterns  or  elon¬ 
gated  structures  that  appear  dominantly  in  the  spectrum.  The  im-  ^93 
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Fig.  19.10 

Windowing  functions  and  (a) 
their  logarithmic  power  spec¬ 
tra  ( continued ).  Cosine2 
window  (a),  Bartlett  win¬ 
dow  (b),  Hanning  window 
(c),  and  Parzen  window  (d). 


ages  in  Fig.  19.15  show  several  elongated  structures  that  show  up  in 
the  spectrum  as  bright  streaks  oriented  perpendicularly  to  the  main 
direction  of  the  image  patterns. 

Natural  images 

Straight  and  regular  structures  are  usually  less  dominant  in  images  of 
natural  objects  and  scenes,  and  thus  the  visual  effects  in  the  spectrum 
are  not  as  obvious  as  with  artificial  objects.  Some  examples  of  this 
class  of  images  are  shown  in  Figs.  19.16  and  19.17. 
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Window  function 
(linear) 

u>(u,v ) 


Window  spectrum 
(logarithmic) 

log  |  W  ( m,n )  | 


Windowed 

image 

g(u,v)-w(u,v) 


Windowed  image 
spectrum  (log.) 

log  |  G(m,n)  *  W  (m,n) 


(a)  Square  window 


(b)  Cosine2  window 


(c)  Bartlett  window 


(d)  Hanning  window 


(e)  Parzen  window 


(f)  Gaussian  window 


Print  patterns 


The  regular  patterns  generated  by  the  common  raster  print  tech¬ 
niques  (Fig.  19.18)  are  typical  examples  for  periodic  multidirectional 
structures,  which  stand  out  clearly  in  the  corresponding  Fourier  spec¬ 
trum. 


19.4  2D  Fourier 
Transform  Examples 

Fig.  19.11 

Application  of  windowing  func¬ 
tions  on  images.  The  plots 
show  the  windowing  func¬ 
tion  w(u,v),  the  logarithmic 
power  spectrum  of  the  window¬ 
ing  function  log  |  W(ra,n)  | ,  the 
windowed  image  g(u,v)  -w(u,v) , 
and  the  power  spectrum 
of  the  windowed  image 
log  | G(m,n)  *  W(m,n)|. 
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Fig.  19.12 

DFT  under  image  scaling. 
The  rectangular  pulse  in  the 
image  function  (a— c)  cre¬ 
ates  a  strongly  oscillating 
power  spectrum  (d— f),  as  in 
the  ID  case.  Stretching  the 
image  causes  the  spectrum 
to  contract  and  vice  versa. 


Fig.  19.13 

DFT  of  oriented,  repetitive 
patterns.  The  image  func¬ 
tion  (a— c)  contains  patterns 
with  three  dominant  orienta¬ 
tions,  which  appear  as  pairs 
of  corresponding  frequency 
spots  in  the  spectrum  (c— f). 
Enlarging  the  image  causes 
the  spectrum  to  contract. 


19.5  Applications  of  the  DFT 

The  Fourier  transform  and  the  DFT  in  particular  are  important  tools 
in  many  engineering  disciplines.  In  digital  signal  and  image  process¬ 
ing,  the  DFT  (and  the  FFT)  is  an  indispensable  “workhorse”  with 
many  applications  in  image  analysis,  filtering,  and  image  reconstruc¬ 
tion,  just  to  name  a  few. 


19.5.1  Linear  Filter  Operations  in  Frequency  Space 

Performing  linear  filter  operations  in  frequency  space  is  an  interesting 
option  because  it  provides  an  efficient  way  to  apply  filters  of  large  spa¬ 
tial  extent.  The  approach  is  based  on  the  convolution  property  of  the 
Fourier  transform  (see  Ch.  18,  Sec.  18.1.6),  which  states  that  a  linear 
convolution  in  image  space  corresponds  to  a  pointwise  multiplication 
in  frequency  space.  Thus  the  linear  convolution  g  *  h  g'  between 
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Fig.  19.14 

DFT — image  rotation.  The 
original  image  (a)  is  rotated  by 
15°  (b)  and  30°  (c).  The  cor¬ 
responding  (squared)  spectrum 
turns  in  the  same  direction  and 
by  exactly  the  same  amount 

(d-f). 


Fig.  19.15 

DFT — superposition  of  image 
patterns.  Strong,  oriented 
subpatterns  (a— c)  are  easy  to 
identify  in  the  corresponding 
spectrum  (d— f).  Notice  the 
broadband  effects  caused  by 
straight  structures,  such  as 
the  dark  beam  on  the  wall  in 


(b,  e). 


an  image  g(u,v)  and  a  filter  matrix  h(u,v)  can  be  accomplished  by 
the  following  steps: 

image  space:  g(u,v)  *  h(u,v)  =  g'(u,v) 

tt  t 

DFT  DFT  DFT-1  (19.23) 

ft  t 

frequency  space :  G(m,n)  •  H(m,n)  — >  G'(m,n). 

First,  the  image  g  and  the  filter  kernel2  h  are  transformed  to  fre¬ 
quency  space  using  the  2D  DFT.  The  corresponding  spectra  G  and 
H  are  then  multiplied  (pointwise),  and  the  result  G'  is  subsequently 

2  Note  that  the  symbol  h  is  used  here  for  any  ID  or  2D  filter  kernel  and  H 
for  the  corresponding  Fourier  spectrum.  This  should  not  to  be  confused 
with  the  use  of  /i,  H  for  ID  and  2D  filter  kernels,  respectively,  in  Ch.  5. 
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Fig.  19.16 

DFT — natural  image  patterns. 
Examples  of  repetitive  struc¬ 
tures  in  natural  images  (a— c) 
that  are  also  visible  in  the 
corresponding  spectrum  (d— f). 


Fig.  19.17 

DFT — natural  image  patterns 
with  no  dominant  orientation. 
The  repetitive  patterns  con¬ 
tained  in  these  images  (a— c) 
have  no  common  orientation  or 
sufficiently  regular  spacing  to 
stand  out  locally  in  the  corre¬ 
sponding  Fourier  spectra  (d— f). 
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transformed  back  to  image  space  using  the  inverse  DFT,  thus  gener¬ 
ating  the  filtered  image  g' . 

The  main  advantage  of  this  “detour”  lies  in  its  possible  efficiency. 
A  direct  convolution  for  an  image  of  size  M  x  M  and  a  filter  matrix 
of  size  N  x  N  requires  0(M2N2)  operations.  Thus,  time  complexity 
increases  quadratically  with  filter  size,  which  is  usually  no  problem 
for  small  filters  but  may  render  some  larger  filters  too  costly  to  imple¬ 
ment.  For  example,  a  filter  of  size  50  x  50  already  requires  about  2500 
multiplications  and  additions  for  every  image  pixel.  In  comparison, 
the  transformation  from  image  to  frequency  space  and  back  can  be 
performed  in  G(M  log2  M)  using  the  FFT,  and  the  pointwise  multi¬ 
plication  in  frequency  space  requires  M 2  operations,  independent  of 
the  filter  size. 

In  addition,  certain  types  of  filters  are  easier  to  specify  in  fre¬ 
quency  space  than  in  image  space;  for  example,  an  ideal  low-pass 


(a) 


19.5  Applications  of 
the  DFT 


Fig.  19.18 

DFT  of  a  print  pattern.  The 
regular  diagonally  oriented 
raster  pattern  (a,  b)  is  clearly 
visible  in  the  corresponding 
power  spectrum  (c).  It  is  pos¬ 
sible  (at  least  in  principle)  to 
remove  such  patterns  by  eras¬ 
ing  these  peaks  in  the  Fourier 
spectrum  and  reconstructing 
the  smoothed  image  from  the 
modified  spectrum  using  the 
inverse  DFT. 


filter,  which  can  be  described  very  compactly  in  frequency  space. 

Further  details  on  filter  operations  in  frequency  space  can  be  found, 
for  example,  in  [88,  Sec.  4.4]. 

19.5.2  Linear  Convolution  and  Correlation 

As  discussed  in  Chapter  5,  Sec.  5.3,  a  linear  correlation  is  the  same 
as  a  linear  convolution  with  a  mirrored  filter  function.  Therefore,  the 
correlation  can  be  computed  just  like  the  convolution  operation  in  the 
frequency  domain  by  following  the  steps  described  in  Eqn.  (19.23). 

This  could  be  advantageous  for  comparing  images  using  correlation 
methods  (see  Ch.  23,  Sec.  23.1.1)  because  in  this  case  the  image  and 
the  “filter”  matrix  (i.e.,  the  second  image)  are  of  similar  size  and  thus 
usually  too  large  to  be  processed  in  image  space. 

Some  operations  in  Image J,  such  as  correlate ,  convolve ,  or  decon¬ 
volve ,  are  also  implemented  in  the  “Fourier  domain”  (FD)  using  the 
2D  DFT.  They  can  be  invoked  through  the  menu  Process  >  FFT  >  FD 
Math. 

19.5.3  Inverse  Filters 

Filtering  in  the  frequency  domain  opens  another  interesting  perspec¬ 
tive:  reversing  the  effects  of  a  filter,  at  least  under  restricted  condi¬ 
tions.  In  the  following,  we  describe  the  basic  idea  only. 

Assume  we  are  given  an  image  gblur  that  has  been  generated  from 
an  original  image  gorig  by  some  linear  filter,  for  example,  motion  blur 
induced  by  a  moving  camera.  Under  the  assumption  that  this  image 
modification  can  be  modeled  sufficiently  by  a  linear  filter  function 

tJ  tJ 
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Fig.  19.19 

Removing  motion  blur  by  in¬ 
verse  filtering.  Original  image 
(a);  image  blurred  by  horizon¬ 
tal  motion  (b);  reconstruction 
using  the  exact  (known)  fil¬ 
ter  function  (c);  result  of  the 
inverse  filter  when  the  filter 
function  deviates  marginally 
from  the  real  filter  (d). 
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hblur,  we  can  state  that 

5fblur(^'?  ^0  (f?orig  *  ^blur)  (^b  ^0  •  (19.24) 

Recalling  that  in  frequency  space  this  corresponds  to  a  multiplication 
of  the  corresponding  spectra,  that  is, 


^blur(^T^)  ^orig(^U  *  -^blur(^T  (19.25) 

it  should  be  possible  to  reconstruct  the  original  (non-blurred)  image 
by  computing  the  inverse  Fourier  transform  of  the  expression 


^orig(^U 


^blur  (^U 

-^blur(^U 


(19.26) 


Unfortunately,  this  “inverse  filter”  only  works  if  the  spectral  coeffi¬ 
cients  i^biur  are  nonzero,  because  otherwise  the  resulting  values  are 
infinite.  But  even  small  values  of  i4blur,  which  are  typical  at  high 
frequencies,  lead  to  large  coefficients  in  the  reconstructed  spectrum 
and,  as  a  consequence,  large  amounts  of  image  noise. 

It  is  also  important  that  the  real  filter  function  be  accurately  ap¬ 
proximated,  because  otherwise  the  reconstructed  image  may  strongly 
deviate  from  the  original.  The  example  in  Fig.  19.19  shows  an  im¬ 
age  that  has  been  blurred  by  smooth  horizontal  motion,  whose  effect 
can  easily  be  modeled  as  a  linear  convolution.  If  the  filter  function 
that  caused  the  blurring  is  known  exactly,  then  the  reconstruction  of 
the  original  image  can  be  accomplished  without  any  problems  (Fig. 
19.19(c)).  However,  as  shown  in  Fig.  19.19(d),  large  errors  occur  if 
the  inverse  filter  deviates  only  marginally  from  the  real  filter,  which 
quickly  renders  the  method  useless. 


(a)  (b)  (c) 

Beyond  this  simple  idea  (which  is  often  referred  to  as  “decon¬ 
volution”),  better  methods  for  inverse  filtering  exist,  such  as  the 
Wiener  filter  and  related  techniques  (see,  e.g.,  [88,  Sec.  5.4],  [128,  Sec. 
8.3],  [126,  Sec.  17.8],  [43,  Ch.  16]). 


19.6  Exercises 

Exercise  19.1.  Implement  the  2D  DFT  using  the  ID  DFT,  as  de¬ 
scribed  in  Sec.  19.1.2.  Apply  the  2D  DFT  to  real  intensity  images 
of  arbitrary  size  and  display  the  results  (by  converting  to  ImageJ 
FloatProcessor  images).  Implement  the  inverse  transform  and  ver¬ 
ify  that  the  back-transformed  result  is  identical  to  the  original  image. 


Exercise  19.2.  Assume  that  the  2D  Fourier  spectrum  of  an  image  ^g  g  Exercises 
with  size  640  x  480  and  a  spatial  resolution  of  72  dpi  shows  a  dominant 
peak  at  position  ±(100, 100).  Determine  the  orientation  and  effective 
frequency  (in  cycles  per  cm)  of  the  corresponding  image  pattern. 

Exercise  19.3.  An  image  with  size  800  x  600  contains  a  wavy  inten¬ 
sity  pattern  with  an  effective  period  of  12  pixels,  oriented  at  30°.  At 
which  frequency  coordinates  will  this  pattern  manifest  itself  in  the 
discrete  Fourier  spectrum? 

Exercise  19.4.  Generalize  Eqn.  (19.15)  and  Eqns.  (19.17) — (19.19) 
for  the  case  where  the  sampling  intervals  are  not  identical  along  the 
x  and  y  axes  (i.e.,  for  rx  ^  ry). 

Exercise  19.5.  Implement  the  elliptical  and  the  super- Gauss  win¬ 
dowing  functions  (Table  19.1)  as  Image J  plugins,  and  investigate  the 
effects  of  these  windows  upon  the  resulting  spectra.  Also  compare 
the  results  to  the  case  where  no  windowing  function  is  used. 
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20 

The  Discrete  Cosine  Transform  (DCT) 


The  Fourier  transform  and  the  DFT  are  designed  for  processing 
complex- valued  signals,  and  they  always  produce  a  complex- valued 
spectrum  even  in  the  case  where  the  original  signal  was  strictly  real¬ 
valued.  The  reason  is  that  neither  the  real  nor  the  imaginary  part  of 
the  Fourier  spectrum  alone  is  sufficient  to  represent  (i.e.,  reconstruct) 
the  signal  completely.  In  other  words,  the  corresponding  cosine  (for 
the  real  part)  or  sine  functions  (for  the  imaginary  part)  alone  do  not 
constitute  a  complete  set  of  basis  functions. 

On  the  other  hand,  we  know  (see  Ch.  18,  Eqn.  (18.21))  that  a 
real-valued  signal  has  a  symmetric  Fourier  spectrum,  so  only  one 
half  of  the  spectral  coefficients  need  to  be  computed  without  losing 
any  signal  information. 

There  are  several  spectral  transformations  that  have  properties 
similar  to  the  DFT  but  do  not  work  with  complex  function  values. 
The  discrete  cosine  transform  (DCT)  is  a  well  known  example  that  is 
particularly  interesting  in  our  context  because  it  is  frequently  used  for 
image  and  video  compression.  The  DCT  uses  only  cosine  functions  of 
various  wave  numbers  as  basis  functions  and  operates  on  real- valued 
signals  and  spectral  coefficients.  Similarly,  there  is  also  a  discrete 
sine  transform  (DST)  based  on  a  system  of  sine  functions  [128]. 


20.1  ID  DCT 

The  discrete  cosine  transform  is  not,  as  one  may  falsely  assume,  only 
a  “one-half”  variant  of  the  discrete  Fourier  transform.  In  the  ID  case, 
the  forward  cosine  transform  for  a  signal  g(u)  of  length  M  is  defined 
as 


G(m) 


M- 1 

E  9(u)  '  cm  ■  COS 

u=0 


(  m(2u+l)  \ 

\n  2 M  )  ’ 


(20.1) 


for  0  <  m  <  M,  and  the  inverse  transform  is 
©  Spring er-Verlag  London  2016 
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M- 1 

F  G(m )  •  cm  ■  cos  (nm(l^1)  )  , 

m=0 


for  0  <  u  <  M,  with 


c 


m 


for  m  =  0, 
otherwise. 


(20.2) 


(20.3) 


Note  that  the  index  variables  (iq  m)  are  used  differently  in  the  for¬ 
ward  and  inverse  transforms  (Eqns.  (20.2)-(20.1)),  so  the  two  trans¬ 
forms  are — unlike  the  DFT — not  symmetric. 


20.1.1  DCT  Basis  Functions 

One  may  ask  how  it  is  possible  that  the  DCT  can  work  without 
any  sine  functions,  while  they  are  essential  in  the  DFT.  The  trick 
is  to  divide  all  frequencies  in  half  such  that  they  are  spaced  more 
densely  and  thus  the  frequency  resolution  in  the  spectrum  is  doubled. 
Comparing  the  cosine  parts  of  the  DFT  basis  functions  (Eqn.  (18.49)) 
and  those  of  the  DCT  (Eqn.  (20.1)), 

DFT:  C%(u)  =cos(2tt^),  (20.4) 

DCT:  Dm(u)  =  cos(7rm^^)  =  cos(27rm^)[fQ'5^ ) ,  (20.5) 

one  can  see  that,  for  a  given  wave  number  m,  the  period  (rm  =  2^) 
of  the  corresponding  DCT  basis  function  is  double  the  period  of  the 
DFT  basis  functions  (rrn  =  — ).  Notice  that  the  DCT  basis  functions 
are  also  phase- shifted  by  0.5  units. 

Figure  20.1  shows  the  DCT  basis  functions  D (u)  for  the  signal 
length  M  =  8  and  wave  numbers  m  =  0, . . . ,  7.  For  example,  the 
basis  function  D%(u)  for  wave  number  m  =  7  performs  seven  full 
cycles  over  a  length  of  2 M  =  16  units  and  therefore  has  the  radial 
frequency  uo  =  m/ 2  =  3.5. 

20.1.2  Implementing  the  ID  DCT 

Since  the  DCT  does  not  create  any  complex  values  and  the  forward 
and  inverse  transforms  (Eqns.  (20.1)  and  (20.2))  are  almost  identical, 
the  whole  procedure  is  fairly  easy  to  implement  in  Java,  as  shown  in 
Prog.  20.1.  The  only  notable  detail  is  that  the  factor  cm  in  Eqn. 
(20.1)  is  independent  of  the  iteration  variable  u  and  can  thus  be 
calculated  outside  the  inner  summation  loop  (see  Prog.  20.1,  line  8). 

Of  course,  much  more  efficient  (“fast”)  DCT  algorithms  exist. 
Moreover,  the  DCT  can  also  be  computed  in  0(M  log2  M)  time  using 
the  FFT  [128,  p.  152]. 


20.2  2D  DCT 
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The  2D  form  of  the  DCT  follows  immediately  from  the  the  ID  defini¬ 
tion  (Eqns.  (20.1)  and  (20.2)),  resulting  in  the  2D  forward  transform 


20.2  2D  DCT 


Fig.  20.1 

DCT  basis  functions  Dq1  (u), 

.  .  .  ,  D¥(u)  for  M  =  8.  Each 
plot  shows  both  the  discrete 
function  (round  dots)  and 
the  corresponding  continuous 
function.  Compared  with  the 
basis  functions  of  the  DFT 
(Figs.  18.11  and  18.12),  all 
frequencies  are  divided  in  half 
and  the  DCT  basis  functions 
are  phase-shifted  by  0.5  units. 
All  DCT  basis  functions  are 
thus  periodic  over  the  length 
2 M  =  16  (as  compared  with  M 
for  the  DFT). 


M  —  l  N- 1 


G(ra,  n)  = 


VMN 


2  •  C™  •  c 


Y  Y  b(u>u) 

u=0  v=0 
M  —  l  N-l 


Cm  COS 


•  cn  cos  ( 


/  7r(2u+l)m  \ 

V  2 M  ) 

7t(2u+1  )n  ^ 


2N 


'YY'  [ 9 V )  ■  D m  (W)  •  Dn  h) 


u=0  u=0 

for  0<ra<M,0<n<7V,  and  the  inverse  transform 


M  —  l  N-l 


g(u,v) 


VMN  M M 

m— 0  n=0 


M—l  N-l 


Y  Y  iGMn) 


•  cm  cos  ( 


•  cos( 


7r(2u+l)m  \ 

2 M  ) 

7r(2u+l)n 


"n 


) 


2  N 

Y  Y  [ G(m ’  n)  '  Cm  '  Dm(u)  -Cn-D% (v) 


VMN  M  N 

m= 0  n=0 


(20.6) 

(20.7) 


(20.8) 

(20.9) 


for  0  <  u  <  M,  0  <  v  <  N.  The  coefficients  cm  and  cn  in  Eqns. 
(20.7)  and  (20.9)  are  the  same  as  in  the  ID  case  (Eqn.  (20.3)).  Notice 
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Prog.  20.1 

ID  DCT  (Java  implementa¬ 
tion).  The  method  DCT()  com¬ 
putes  the  forward  transform 
for  a  real-valued  signal  vector 
g  of  arbitrary  length  accord¬ 
ing  to  the  definition  in  Eqn. 
(20.1).  The  method  returns  the 
DCT  spectrum  as  a  real-valued 
vector  of  the  same  length  as 
the  input  vector  g.  The  in¬ 
verse  transform  iDCT()  com¬ 
putes  the  inverse  DCT  for  the 
real-valued  cosine  spectrum  G. 


1 

double  []  DCT  (double  []  g)  {  //  forward  DCT  on  signal  g 

2 

int  M  =  g. length; 

3 

double  s  =  Math,  sqrt  (2 . 0  /  M) ;  //  common  scale  factor 

4 

double  []  G  =  new  double  [M]  ; 

5 

for  (int  m  =  0;  m  <  M;  m++)  { 

6 

double  cm  =  1.0; 

7 

if  (m  ==  0) 

8 

cm  =  1.0  /  Math . sqrt (2) ; 

9 

double  sum  =  0; 

10 

for  (int  u  =  0;  u  <  M;  u++)  { 

11 

double  Phi  =  Math. PI  *  m  *  (2  *  u  +  1)  /  (2* 

M); 

12 

sum  +=  g[u]  *  cm  *  Math. cos (Phi) ; 

13 

} 

14 

G[m]  =  s  *  sum; 

15 

} 

16 

return  G; 

17 

18 

19 

} 

20 

double  []  iDCT  (double  []  G)  {  //  inverse  DCT  on  spectrum  G 

21 

int  M  =  G. length; 

22 

double  s  =  Math,  sqrt  (2 . 0  /  M) ;  //common  scale  factor 

23 

double  []  g  =  new  double  [M]  ; 

24 

for  (int  u  =  0;  u  <  M;  u++)  { 

25 

double  sum  =  0; 

26 

for  (int  m  =  0;  m  <  M;  m++)  { 

27 

double  cm  =  1.0; 

28 

if  (m  ==  0) 

29 

cm  =  1.0  /  Math . sqrt (2) ; 

30 

double  Phi  =  Math. PI  *m*  (2  *  u  +  1)  /  (2* 

M); 

31 

sum  +=  G[m]  *  cm  *  Math. cos (Phi) ; 

32 

} 

33 

g[u]  =  s  *  sum; 

34 

} 

35 

return  g; 

36 

} 

that  in  the  forward  transform  (and  only  there!)  the  factors  cm,  cn 
are  independent  of  the  iteration  variables  r,  v  and  can  thus  be  placed 
outside  the  summation  (as  shown  in  Eqn.  (20.7)). 

20.2.1  Examples 

Figure  20.2  shows  several  examples  of  the  DCT  in  comparison  with 
the  results  of  the  DFT.  Since  the  DCT  spectrum  is  (in  contrast  to 
the  DFT  spectrum)  not  symmetric,  it  does  not  get  centered  but  is 
displayed  in  its  original  form  with  its  origin  at  the  upper  left  corner. 
The  intensity  corresponds  to  the  logarithm  of  the  absolute  value  in 
the  case  of  the  (real-valued)  DCT  spectrum.  Similarly,  the  usual 
logarithmic  power  spectrum  is  shown  for  the  DFT.  Notice  that  the 
DCT  is  not  simply  a  section  of  the  DFT  but  obviously  combines 
structures  from  adjacent  quadrants  of  the  Fourier  spectrum. 
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Original 


DFT 


DOT 


20.2  2D  DCT 


Fig.  20.2 

2D  DFT  versus  DCT.  Both 
transforms  show  the  frequency 
effects  of  image  structures  in 
a  similar  fashion.  In  the  real¬ 
valued  DCT  spectrum  (right), 
all  coefficients  are  contained 
in  a  single  quadrant  and  the 
frequency  resolution  is  doubled 
compared  with  the  DFT  power 
spectrum  (center).  The  DFT 
spectrum  is  centered  as  usual, 
while  the  origin  of  the  DCT 
spectrum  is  located  at  the  up¬ 
per  left  corner.  Both  spectral 
plots  display  logarithmic  inten¬ 
sity  values. 


20.2.2  Separability 


Similar  to  the  DFT  (  see  Ch.  19,  Eqn.  (19.9)),  the  2D  DCT  can  also 
be  separated  into  two  successive  ID  transforms.  To  make  this  fact 
clear,  the  forward  DCT  (Eqn.  (20.7))  can  be  expressed  in  the  form 


507 


20  The  Discrete  Cosine 
Transform  (DCT) 


G(ra,  n) 


_  N- 1  M—l 

V^'E[V^  'E  9(u,v)-cm-D^(u)-cn-D^ (v)  . 

v=0  u=0 

"  ID  DCT^ of  g(-,v)  ^  (20.10) 


The  inner  expression  in  Eqn.  (20.10)  corresponds  to  a  ID  DCT  of 
the  vth.  line  g(',v)  of  the  2D  signal  function.  Thus,  as  with  the  2D 
DFT,  one  can  first  apply  a  ID  DCT  to  every  line  of  an  image  and 
subsequently  a  DCT  to  each  column.  Of  course,  one  could  equally 
follow  the  reverse  order  by  doing  a  DCT  on  the  columns  first  and 
then  on  the  rows. 

The  DCT  is  often  used  for  image  compression,  in  particular  for 
JPEG  where  the  size  of  the  transformed  sub-images  is  fixed  at  8  x  8 
and  the  processing  can  be  highly  optimized.  Applying  the  DCT  to 
square  images  (or  sub-images)  of  size  M  x  M  is  indeed  an  important 
special  case.  Here  the  DCT  is  commonly  expressed  in  matrix  form, 

G  =  A  g  AT,  (20.11) 


where  the  matrices  g  and  G  (both  of  size  M  x  M)  represent  the 
2D  signal  and  the  resulting  DCT  spectrum,  respectively.  A  is  a 
quadratic,  real-valued  transformation  matrix  with  the  elements  (cf. 
Eqn.  (20.1)) 

Ti  =  ■/CcVcos(7r'  1  (20-12) 

with  0  <  ij  <  M  and  q  as  defined  in  Eqn.  (20.3).  The  x/y  sep¬ 
arability  of  the  DCT  is  easy  to  see  in  this  notation.  The  matrix  A 
is  real- valued  and  orthonormal ,  i.e.,  A  •  AT  =  AT  •  A  =  I  and  so 
its  transpose  AT  is  identical  to  the  inverse  matrix  A-1.  Thus  the 
associated  inverse  transformation  from  the  DCT  spectrum  G  back 
to  the  signal  g  can  be  carried  out  in  the  form 

g  =  AT  G  A,  (20.13) 

with  the  same  matrices  A  and  AT  as  used  in  the  forward  transform. 
For  example,  for  M  =  4  the  DCT  transformation  matrix  is 

/  A),0  A),l  A), 2  A), 3  ^ 

—  Al,0  Al,1  Al,2  Al,3 

^2,0  A2,l  A2,2  A2,3 

\^3,0  Al,l  A3, 2  A3, 3  / 

/  \  cos(0)  \  cos(0)  \  cos(0)  \  cos(0)  \ 

7fc°s(|)  TfCos(f)  ^cos(f)  ^cos(y) 

75c°s(x)  72COS(x)  7Icos(¥)  72  cos(il£) 
\7fcos(¥)  7fcos(¥)  7f  cos(^)  ^cos(^)/ 

/  0.50000  0.50000  0.50000  0.50000  \ 

0.65328  0.27060  -0.27060  -0.65328 

^  0.50000  -0.50000  -0.50000  0.50000  ' 

\  0.27060  -0.65328  0.65328  -0.27060  j 

For  the  arbitrarily  chosen  2D  signal  (i.e.,  “image”) 


(20.14) 


(20.15) 


(20.16) 
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/  1  2  3  4  \ 
7  2  0  9 
6  5  2  5 
\  0  9  8  1  / 


for  example,  the  DCT  spectrum  obtained  with  Eqn.  (20.11)  is 


/  16.00000  -0.95671  0.50000  -2.30970  \ 

-2.61313  -1.81066  6.57924  0.45711 

-2.00000  -1.65642  -8.50000  1.22731  ’ 

\ -1.08239  0.95711  -1.10162  0.31066  ) 


(20.18) 


which  is  the  same  as  the  result  from  Eqn.  (20.7)  or,  alternatively, 
Eqn.  (20.10). 

The  matrix  notation  of  the  DCT,  as  shown  in  Eqn.  (20.11)  and 
Eqn.  (20.13),  is  particularly  useful  for  describing  the  transformation 
of  small,  fixed-size  sub-images.  This  is  an  important  component  com¬ 
mon  in  most  image  and  video  compression  methods  (including  JPEG 
and  MPEG)  that  calls  for  efficient  implementations. 


20.3  Java  Implementation 

A  straightforward  Java  implementation  of  the  one-  and  two-dimensio¬ 
nal  DCT  is  available  online  as  part  of  the  imagingbook  library.1  For 
efficiency  reasons,  the  following  methods  generally  work  “in  place”, 
i.e.,  the  supplied  data  array  is  destructively  modified  by  the  trans¬ 
formation. 

Dctld  (class) 

This  class  implements  the  ID  DCT  (see  also  Prog.  20.1): 

Dctld  (int  M) 

Constructor;  M  denotes  the  length  of  the  expected  signal, 
void  DCT  (double  []  g) 

Calculates  the  DCT  spectrum  of  the  one-dimensional  signal 
g.  The  array  g  is  modified,  it’s  content  being  replaced  by  the 
resulting  spectrum. 

void  iDCT  (double  []  G) 

Reconstructs  the  original  signal  from  the  one-dimensional  DCT 
spectrum  G.  The  array  G  is  modified,  it’s  content  being  replaced 
by  the  reconstructed  signal. 

Pre-calculated  cosine  tables  are  used  in  both  the  forward  and  inverse 
transformation  for  efficient  processing. 

Dct2d  (class) 

This  class  implements  the  2D  DCT  (by  using  class  Dctld): 

Dct2d  () 

Constructor;  in  this  case  no  dimension  argumens  are  required. 


1  Package  imagingbook. pub . dct. 
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20  The  Discrete  Cosine  void  DCT  (float  []  []  g) 

Transform  (DCT)  Calculates  the  DCT  spectrum  of  the  2D  signal  g.  The  array  g 

is  modified. 

void  iDCT  (float  []  []  G) 

Reconstructs  the  original  signal  from  the  two-dimensional 
DCT  spectrum  G.  The  array  G  is  modified. 

FloatProcessor  DCT  (FloatProcessor  g) 

Calculates  the  DCT  spectrum  of  the  image  g  and  returns  a 
new  image  with  the  resulting  spectrum  (g  is  not  modified). 

FloatProcessor  iDCT  (FloatProcessor  G) 

Calculates  the  inverse  DCT  from  the  2D  spectrum  G  and  re¬ 
turns  the  reconstructed  image  (G  is  not  modified). 


20.4  Other  Spectral  Transforms 

Apparently,  the  Fourier  transform  is  not  the  only  way  to  represent  a 
given  signal  in  frequency  space;  in  fact,  numerous  similar  transforms 
exist.  Some  of  these,  such  as  the  discrete  cosine  transform,  also 
use  sinusoidal  basis  functions,  while  others,  such  as  the  Hadamard 
transform  (also  known  as  the  Walsh  transform),  build  on  binary  0/1- 
functions  [43,126]. 

All  of  these  transforms  are  of  global  nature;  i.e.,  the  value  of  any 
spectral  coefficient  is  equally  influenced  by  all  signal  values,  inde¬ 
pendent  of  the  spatial  position  in  the  signal.  Thus  a  peak  in  the 
spectrum  could  be  caused  by  a  high-amplitude  event  of  local  extent 
as  well  as  by  a  widespread,  continuous  wave  of  low  amplitude.  Global 
transforms  are  therefore  of  limited  use  for  the  purpose  of  detecting 
or  analyzing  local  events  because  they  are  incapable  of  capturing  the 
spatial  position  and  extent  of  events  in  a  signal. 

A  solution  to  this  problem  is  to  use  a  set  of  local ,  spatially  limited 
basis  functions  (“wavelets”)  in  place  of  the  global,  spatially  fixed  basis 
functions.  The  corresponding  wavelet  transform ,  of  which  several 
versions  have  been  proposed,  allows  the  simultaneous  localization 
of  repetitive  signal  components  in  both  signal  space  and  frequency 
space  [158]. 


20.5  Exercises 

Exercise  20.1.  Implement  an  efficient  (“hard-coded”)  Java  method 
for  computing  the  ID  DCT  of  length  M  =  8  that  operates  without 
iterations  (loops)  and  contains  all  necessary  coefficients  as  precom¬ 
puted  constants. 

Exercise  20.2.  Consider  how  the  implementation  of  the  one-dimen¬ 
sional  DCT  in  Prog.  20.1  could  be  accelerated  by  using  a  pre¬ 
calculated  table  of  the  cosine  values  (for  a  given  signal  length  M). 
Hint:  A  table  of  length  4 M  is  required. 

Exercise  20.3.  Verify  by  numerical  computation  that  the  DCT  ba¬ 
sis  functions  D^(u)  for  0  <  m,  u  <  M  (Eqn.  (20.5))  are  pairwise 


orthogonal;  i.e.,  the  inner  product  of  the  vectors  D •  D^f  is  zero  for  20.5  Exercises 
any  pair  m  ^  n. 

Exercise  20.4.  Implement  the  2D  DCT  (Sec.  20.2)  as  an  ImageJ 
plugin  for  images  of  arbitrary  size.  Make  use  of  the  fact  that  the 
2D  DCT  can  be  performed  as  a  sequence  of  ID  transforms  (see  Eqn. 

(20.10)). 

Exercise  20.5.  Verify  for  the  4x4  DCT  example  in  Eqn.  (20.18) 
that  the  result  of  the  inverse  transformation  in  Eqn.  (20.13)  is  really 
identical  to  the  original  signal  g  in  Eqn.  (20.17). 

Exercise  20.6.  Show  that  the  M  x  M  matrix  A  (with  elements  as 
defined  in  Eqn.  (20.12))  is  really  orthonormal,  i.e.,  A  •  AT  =  I. 
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Common  to  all  the  filters  and  point  operations  described  so  far  is 
the  fact  that  they  may  change  the  intensity  function  of  an  image 
but  the  position  of  each  pixel,  and  thus  the  geometry  of  the  image, 
remains  the  same.  The  purpose  of  geometric  operations,  which  are 
discussed  in  this  chapter,  is  to  deform  an  image  by  altering  its  ge¬ 
ometry.  Typical  examples  are  shifting,  rotating,  or  scaling  images, 
as  shown  in  Fig.  21.1.  Geometric  operations  are  frequently  needed  in 
practical  applications,  for  example,  in  virtually  any  modern  graphi¬ 
cal  computer  interface.  Today  we  take  for  granted  that  windows  and 
images  in  graphic  or  video  applications  can  be  zoomed  continuously 
to  arbitrary  size.  Geometric  image  operations  are  also  important  in 
computer  graphics  where  textures,  which  are  usually  raster  images, 
are  deformed  to  be  mapped  onto  the  corresponding  3D  surfaces,  pos¬ 
sibly  in  real  time.  Of  course,  geometric  operations  are  not  as  simple 
as  their  commonality  may  suggest.  While  it  is  obvious,  for  example, 
that  an  image  could  be  enlarged  by  some  integer  factor  n  simply  by 
replicating  each  pixel  n  x  n  times,  the  results  would  probably  not 
be  appealing,  and  it  also  gives  us  no  immediate  idea  how  to  handle 
continuous  scaling,  rotating  images,  or  other  image  deformations.  In 
general,  geometric  operations  that  achieve  high-quality  results  are 
not  trivial  to  implement  and  are  also  computationally  demanding, 
even  on  today’s  fast  computers. 

In  principle,  a  geometric  operation  transforms  a  given  image  I  to 
a  new  image  I'  by  modifying  the  coordinates  of  image  pixels, 

<-  I(x,y),  (21.1) 

that  is,  the  value  of  the  image  function  I  at  the  original  location  (x,  y) 
moves  to  the  new  position  (V,  y')  in  the  transformed  image  V .  Thus 
(at  least  in  the  continuous  case)  the  values  of  the  image  elements  do 
not  change  but  only  their  positions. 

To  model  this  process,  we  first  need  a  2D  transformation  function 
or  geometric  mapping  T,  for  example,  in  the  form 

T  :  R2  ->  R2,  (21.2) 

©  Spring er-Verlag  London  2016 
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Fig.  21.1 

Typical  examples  for  geomet¬ 
ric  operations:  original  image 
(a),  translation  (b),  scaling 
(contracting  or  stretching) 
in  x  and  y  directions  (c),  ro¬ 
tation  about  the  center  (d), 
projective  transformation  (e), 
and  nonlinear  distortion  (f). 


that  specifies  for  each  original  2D  coordinate  point  x  =  {x,  y )  the 
corresponding  target  point  x'  =  {x',y')  in  the  new  image  I', 

=  T(x,y).  (21.3) 

Notice  that  the  coordinates  {x,y)  and  {x',y')  specify  points  in  the 
continuous  image  plane  R  x  R.  The  main  problem  in  transforming 
digital  images  is  that  the  pixels  I{u,  v )  are  defined  not  on  a  continu¬ 
ous  plane  but  on  a  discrete  raster  Z  x  Z.  Obviously,  a  transformed 
coordinate  {u',v')  =  T{u,v)  produced  by  the  mapping  function  T() 
will,  in  general,  no  longer  fall  onto  a  discrete  raster  point.  The  so¬ 
lution  to  this  problem  is  to  compute  intermediate  pixel  values  for 
the  transformed  image  by  a  process  called  interpolation  (see  Ch.  22), 
which  is  the  second  essential  element  in  any  geometric  operation. 


21.1  2D  Coordinate  Transformations 

The  mapping  function  T()  in  Eqn.  (21.3)  is  an  arbitrary  continu¬ 
ous  function  that  for  reasons  of  simplicity  is  often  specified  as  two 
separate  functions, 

x  =  Tjpc,  y)  and  y  =  Ty(x,  y)  (21.4) 

for  the  x  and  y  components,  respectively. 

21.1.1  Simple  Geometric  Mappings 

The  simple  mapping  functions  include  translation,  scaling,  shearing, 

T7T  and  rotation,  defined  as  follows: 
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Translation  (shift)  by  a  vector  (dx,dy): 


Tv  :  x'  =  x  +  cL  ( x'\  f x\  f cL \ 

Ty  '■  y1  =  y  +  dy  °r  \y'J  ~  \y)  +  \dy )  ' 


(21.5) 


Scaling  (contracting  or  stretching)  along  the  x  or  y  axis  by  the  factor 
sx  or  sy,  respectively: 


T1  •  T  —  q  .  qp 

^  •  tC  O  rjQ  tA2 

-^y  ’  V  Sy  *  y 


or 


( sT  0  \  f  x\ 

( 0  j  ■  u 


(21.6) 


Shearing  along  the  x  and  y  axis  by  the  factor  bx  and  by,  respectively 
(for  shearing  in  only  one  direction,  the  other  factor  is  set  to  zero): 


Tx  :  x'  =  x  +  bx  •  y  Qr  f xr\  =  (  1  bx\ 

Ty  ■  y'  =  y  +  by  -x  \y'J  \by  1  J  \yj 


(21.7) 


Rotation  by  an  angle  a,  with  the  coordinate  origin  being  the  center 
of  rotation: 


Tx  :  x'  =  x  •  cos  a  —  y  •  sin  a 
Ty  :  yr  =  x  •  sin  a  +  y  •  cos  a 


( cos  a  —  sin  a 
ysina  cos  a 


or 


(21.8) 

(21.9) 


Rotating  the  image  by  an  angle  a  around  an  arbitrary  center  point 
xc  =  ( xc,yc )  is  accomplished  by  first  translating  the  image  by 
(— xc,  — yc ),  such  that  xc  coincides  with  the  origin,  then  rotating  the 
image  about  the  origin  (as  in  Eqn.  (21.9)),  and  finally  shifting  the 
image  back  by  (xc,yc).  The  resulting  composite  transformation  is 


Tx  :  x'  =  xc  +  (x  —  xc)  •  cos  ex  —  (y  —  yc)  •  sin  a 
Ty  :  y'  =  yc  +  (x  —  xc)  •  since  +  ( y~yc )  *  cos ce 


+ 


f  cos  a  —  sin  a 
ysince  cos  ce 


f  x  —  xr 
\V-Vc 


(21.10) 


(21.11) 


The  combination  of  the  operations  listed  in  Eqns.  (21.5)-(21.9)  con¬ 
stitute  the  important  class  of  “affine”  transformations  or  affine  map¬ 
pings  (see  also  Sec.  21.1.3). 


21.1.2  Homogeneous  Coordinates 

To  simplify  the  concatenation  of  linear  mappings,  it  is  advantageous 
to  specify  all  operations  in  the  form  of  vector-matrix  multiplications, 
as  in  Eqns.  (21.6)-(21.9).  Note  that  pure  translation  Eqn.  (21.5), 
which  corresponds  to  a  vector  addition,  cannot  be  formulated  as  a 
vector-matrix  multiplication.  Fortunately,  this  difficulty  can  be  ele¬ 
gantly  resolved  with  so-called  homogeneous  coordinates  (see,  e.g.,  [75, 
p.  204]). 1 


21.1  2D  Coordinate 
Transformations 


1  See  also  Sec.  B.5  in  the  Appendix. 
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To  turn  an  “ordinary”  (i.e.,  Cartesian)  coordinate  into  a  homo¬ 
geneous  coordinate,  the  original  vector  is  simply  extended  by  an  ad¬ 
ditional  element  with  constant  value  1.  For  example,  a  2D  Cartesian 
point  x  =  (x,y)T  converts  to  a  3D  vector, 


hom(x)  =  horn 


(21.12) 


Note  that  the  homogeneous  representation  is  not  unique,  but  any 
scaled  vector  s-x  is  an  equivalent  homogeneous  representation  of  the 
Cartesian  coordinate  cc,  that  is 

x  =  hom-1  (x)  =  hom-1  (s  •  x),  (21.13) 


for  any  nonzero  s  E  M.  For  example,  the  homogeneous  coordi¬ 
nates  x i  =  (3,2,1)t,  x2  =  (— 6,  — 4,  — 2)T,  and  x3  =  (30,20, 10)T 
are  all  equivalent  representations  of  the  same  Cartesian  coordinate 
x  =  (3,  2)T. 

The  reverse  mapping  from  a  3D  homogeneous  coordinate  x  =  (x, 
y,  z)J  to  the  corresponding  2D  Cartesian  coordinate  is  denoted 


hom  1(x)  =  hom 


-l 


y  = 


i 


z 


=  X 


(21.14) 


With  the  help  of  homogeneous  coordinates,  we  can  now  define  a  2D 
translation  (Eqn.  (21.5))  as  a  vector-matrix  product  in  the  form 


hom" 


•  hom 


(21.15) 


1  0  d 
0  1  d 


(21.16) 


which  had  been  our  motive  for  introducing  homogeneous  coordinates 
in  the  first  place.  As  we  shall  see  in  the  following  sections,  homo¬ 
geneous  coordinates  allow  us  to  write  many  common  2D  coordinate 
transformations  in  the  form 


x'  =  A  •  x  , 


(21.17) 


where  A  is  a  3  x  3  matrix.  Note  that  (due  to  the  relation  in  Eqn. 
(21.13))  multiplying  the  matrix  A  by  some  scalar  factor  s  yields  the 
same  transformation  in  terms  of  Cartesian  coordinates,  that  is, 


x 


=  hom  i[A-x]=hom  1  [s- (A*®)]  =  hom  i[(s-A)-®],  (21.18) 


for  any  nonzero  s  E  R. 


21.1.3  Affine  (Three-Point)  Mapping 

In  general,  and  analogous  to  Eqn.  (2E16),  we  can  express  any  com¬ 
bination  of  2D  translation,  scaling,  and  rotation  as  vector-matrix 
multiplication  in  homogeneous  coordinates  in  the  form 
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A 


affine 


X 


(21.19) 


or  x'  =  hom  1  [Aaffine  •hom(x)]  in  Cartesian  coordinates,  that  is, 


21.1  2D  Coordinate 
Transformations 


hom 


-l 


/ a0 o  a0 1  a02\ 
I  a10  an  a12  I 

V  0  0  1  / 


f  a00  a01  a02 

\a10  all  a12 


This  2D  coordinate  transformation  is  called  an  “affine  mapping”  with 
the  six  parameters  a0o,---,&12,  where  a0 2,  a12  specify  the  trans¬ 
lation  (equivalent  to  dx,dy  in  Eqn.  (21.5))  and  a00,  a0 1,  a10,  an 
aggregate  the  scaling,  shearing,  and  rotation  coefficients  (see  Eqns. 
(21.6)-(21.9)).  For  example,  the  affine  transformation  matrix  for  a 
rotation  about  the  origin  by  an  angle  a  is  specified  by  the  matrix 


-^-rot 


a00  a01  a02 

a10  all  a12 

0  0  1 


cos  a  —  sin  a  0 
sin  a  cos  a  0 

0  0  1 


(21.21) 


In  this  way,  compound  transformations  can  be  constructed  easily 
by  consecutive  matrix  multiplications  (from  right  to  left).  For  ex¬ 
ample,  the  transformation  matrix  for  a  rotation  by  a  about  a  given 
center  point  xc  =  ( xc,yc)T  (see  Eqn.  (21.11)),  composed  by  a  trans¬ 
lation  to  the  origin  followed  by  a  rotation  and  another  translation,  is 


cos  a  —  sin  a  0 
since  cosce  0 

0  0  1 


translation  by 

Oo  VcY 


v 

rotation  by  a 
(about  the  origin) 


translation  by 
(~xc,  -yc)T 


(1  — cos a)-\-yc  •  since 
(1  —  cosce)—  xc  •  since 

1 


(21.22) 


(21.23) 


(21.24) 


Of  course,  the  result  is  the  same  as  in  Eqn.  (21.10). 

Note  that  multiplying  two  affine  transformation  matrices  always 
yields  another  affine  transformation.  Also,  an  affine  transformation 
maps  straight  lines  to  straight  lines,  triangles  to  triangles,  and  rect¬ 
angles  to  parallelograms,  as  illustrated  in  Fig.  21.2.  The  distance 
ratio  between  points  on  a  straight  line  remains  unchanged  by  this 
type  of  mapping  function. 


Affine  transformation  parameters  from  three  point  pairs 

The  six  parameters  of  the  2D  affine  mapping  (Eqn.  (21.20))  are 
uniquely  determined  by  three  pairs  of  corresponding  points  (a?0,  #1), 
(xnx[),  (x2,x2),  with  the  first  point  xi  =  [xi^yi)  of  each  pair  lo¬ 
cated  in  the  original  image  and  the  corresponding  point  x\  =  (x',  y[ ) 
located  in  the  target  image.  From  these  six  coordinate  values,  the 
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Fig.  21.2 

Affine  mapping.  An  affine  2D 
transformation  is  uniquely 
specified  by  three  pairs 
of  corresponding  points; 
for  example,  (a30,sci), 
(®i,  *i),  and  (®2,  *2). 


/ 


six  transformation  parameters  a0o,  • . . ,  a12  are  derived  by  solving  the 
system  of  linear  equations 

x'o  =  aoo'xo  +  a0i  *2/0  +  a02?  Vo  =  aio'xo  +  an  *2/o  +  &12? 

xi  =  aoo ‘xi  +  &01 'Vi  +  &02?  2/i  =  aio‘xi  +  &11  * 2/1  +  ^12,  (21.25) 

x2  =  a00‘x2  +  ^oi  *2/2  +  &02?  y'2  =  a10‘x2  +  all'V2  +  a12? 


provided  that  the  points  (vectors)  cc0,  aq,  a?2  are  linearly  independent 
(i.e.,  that  they  do  not  he  on  a  common  straight  line).  Since  Eqn. 
(21.25)  consists  of  two  independent  sets  of  linear  3x3  equations  for 
x\  and  ?/',  the  solution  can  be  written  in  closed  form  as 


aoo 

a01 

a10 

all 

a02 

a12 


1 

d 

1 

d 

1 

d 

1 

d 

1 

d 

1 

d 


bob!  -4) 
boffi-A) 

bo  (y  1-2/2) 
bo  (2/2 -2/1) 


+  ^ibo-T2) 

+  2/1  (2/2 -2/0) 


+  2/2  Co- 2=1 ) 

+  x2(a;i-a;o)]> 

+  2/2  (2/0- 2/1  )]> 

+  x2(y,1-y'0)\, 


(21.26) 


+  aaO/0-2/2) 

+  x1(y0x,2-y2x,0)  +  x2(y1x'0-y0x,1)], 
bo(2/22/i -2/12/2)  +  x1(y0y'2-y2y'0)  +  X2(y1y,0-y0y[)}, 


with  d  =  x0(y2-y1)  +  x1(y0-y2)  +  x2(y1-y0). 


Inverse  affine  mapping 

The  inverse  of  the  affine  transformation,  which  is  often  required  in 
practice  (see  Sec.  21.2.2),  can  be  calculated  by  simply  applying  the 
inverse  of  the  transformation  matrix  Aaffine  (Eqn.  (21.20))  in  homo¬ 
geneous  coordinate  space,  that  is, 


-1 

affine 


(21.27) 


-1 

affine 


horn  (a?7) 
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or  x  =  horn 


in  Cartesian  coordinates,  that  is, 


horn 


-l 


horn" 


a0 0  a01  a02 

a10  all  al2 

0  0  1 

1 


a00all  — a01a10 


1 


a00all  — a01a10 


-1 

Laffine 


all 

a10 


a01  a01a12~a02all 
a00  a02a10~a00a12 


(21.28) 


all  ~a01  a01a12~a02all\ 

—aio  a0 o  tt02a10  —  &ooai2  I 

0  0  G'OOan~aoiaio/ 


X 

y' 

(21. 

(21. 


29) 

30) 
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Since  the  bottom  row  of  A“^ne  in  Eqn.  (21.29)  consists  of  the  el¬ 
ements  (0,0,1),  the  inverse  mapping  is  again  an  affine  transforma¬ 
tion.  Of  course,  the  inverse  of  the  affine  mapping  can  also  be  found 
directly  (i.e.,  without  inverting  the  transformation  matrix)  from  the 
given  point  coordinates  (xi^x'i)  by  using  Eqns.  (21.25)  and  (21.26) 
with  interchanged  source  and  target  coordinates. 


21.1.4  Projective  (Four-Point)  Mapping 

In  contrast  to  the  affine  transformation,  which  provides  a  mapping 
between  arbitrary  triangles,  the  projective  transformation  is  a  linear 
mapping  between  arbitrary  quadrilaterals  (Fig.  21.3).  This  is  partic¬ 
ularly  useful  for  deforming  images  controlled  by  mesh  partitioning, 
as  described  in  Sec.  21.1.7.  To  map  from  an  arbitrary  sequence  of 
four  2D  points  (x0,  aq,  x2 ,  x3)  to  a  set  of  corresponding  points  (a?g, 
x[,  x'2 ,  CC3),  the  transformation  requires  eight  degrees  of  freedom, 
two  more  than  needed  for  the  affine  transformation.  Analogous  to 
the  affine  transformation  (Eqn.  (21.20)),  a  projective  transformation 
can  be  expressed  as  a  linear  mapping  in  homogeneous  coordinates, 


•  •  T 

proj  — 


(21.31) 


or  x'  =  horn 


proj 


hom(cc) 


in  Cartesian  coordinates,  that  is, 


^  =  horn  1 


a20‘x  "b 

with  the  two  additional  elements  (parameters)  a2o  and  a21  in  the 
transformation  matrix  Aproj.  Because  x,y  appear  in  the  denomina¬ 
tor  of  the  fraction  in  Eqn.  (21.33),  the  projective  mapping  is  gen¬ 
erally  nonlinear  in  Cartesian  coordinates.  Despite  this  nonlinearity, 
straight  lines  are  preserved  under  this  transformation.  In  fact,  this  is 
the  most  general  transformation  that  maps  straight  lines  to  straight 
lines  in  2D,  and  it  actually  maps  any  TVth-order  algebraic  curve  onto 
another  TVth-order  algebraic  curve.  In  particular,  circles  and  ellipses 
always  transform  into  other  second-order  curves  (i.e.,  conic  sections). 
Unlike  the  affine  transformation,  however,  parallel  lines  do  not  gener¬ 
ally  map  to  parallel  lines  under  a  projective  transformation  (cf.  Fig. 


a00  a01  a02 
a10  all  a12 
ka20  a21  1 

1 _  _  /  a00  a01  a02 

a21‘V  H“  1  \tt10  all  a12 


(21.32) 


(21.33) 
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i 


i' 


Fig.  21.3 

Projective  mapping.  Four 
pairs  of  corresponding  2D 

points,  (*0,*o)>  (*i,*i), 
( x2,x'2 ),  (*3,053)  uniquely 
specify  a  projective  trans¬ 
formation.  Straight  lines  are 
again  mapped  to  straight  lines, 
and  a  rectangle  is  mapped  to 
some  quadrilateral.  In  gen¬ 
eral,  neither  parallelism  be¬ 
tween  straight  lines  nor  the 
distance  ratio  is  preserved. 


21.3)  and  the  distance  ratios  between  points  on  a  line  are  not  pre¬ 
served.  The  projective  mapping  is  therefore  sometimes  referred  to  as 
“perspective”  or  “pseudo-perspective”. 


Projective  transformation  parameters  from  four  point  pairs 

Given  four  pairs  of  corresponding  2D  points,  (cc0,£Cq),  . . . ,  (x3,x'3), 
with  one  point  xi  =  [xi^yi)1  in  the  source  image  and  the  second  point 
x[  =  (x',  yl)1  in  the  target  image,  the  eight  unknown  transformation 
parameters  a0 0, . . . ,  a2\  can  be  found  by  solving  a  system  of  linear 
equations.  Multiplying  Eqn.  (21.33)  by  the  common  denominator  on 
the  right  hand  side  gives 


x' *  ia20‘x  +  &21  ■  y  +  1)  =  clqq-x  +  CLQi-y  +  a02, 

y'  '(a  20 'x  +  a2i’V  + 1)  =  &io‘x  +  cLii-y  +  <^12? 

and  thus 

Ci2Q  ‘ x  ‘ x  T  R-21  %y  ‘x  x  =  &00  •  X  T  CL3\  ‘  y  T  ttQ2 , 

a2o‘x‘y'  +  a2i‘y‘y'  +  y'  =  aio  ‘x  +  awy  +  ai2? 


(21.34) 


(21.35) 


for  any  pair  of  corresponding  points  x  =  (x,y)T  and  x'  =  (x',y')T . 
By  slightly  rearranging  Eqn.  (21.35)  and  inserting  the  (known)  source 
and  target  point  coordinates  [xilyi)  and  (x',?/'),  respectively,  we 
obtain  one  such  pair  of  linear  equations 


xi  —  a00  ’xi  +  a01'2/i  +  a02  ~  a20’xi’xi  ~  a21’2/rx£> 

y'i  =  aio'xi  +  an-yi  +  ai2  —  a2o ’xi'y'i  ~  a2 vy^y^ 


(21.36) 


for  each  point  pair  i  =  0, . . . ,  3  and  the  eight  unknowns  a00, . . . ,  a2 1- 
Combining  the  resulting  eight  equations  in  the  usual  matrix  notation 
yields 


(x'o\  . 

(  x3 

yo 

1 

0 

0 

0 

ry*  ry*t 

XqXq 

- Vox'o  \ 

(  aoo\ 

Vo 

0 

0 

0 

x0 

2/o 

1 

-XoVo 

-VoVo 

a01 

ry*! 

Jb  ^ 

x1 

Vi 

1 

0 

0 

0 

_ ry*  ry*^ 

il/  j  T  1 

-yix[ 

a02 

y[ 

0 

0 

0 

X1 

2/i 

1 

***.  T— 1 

s* 

1—1 

? 

-2/12/i 

a10 

ry*^ 

x2 

x2 

V2 

1 

0 

0 

0 

ry*  ry*^ 

“2/2*2 

all 

y'2 

0 

0 

0 

x2 

2/2 

1 

~x2y'2 

-2/22/2 

a12 

ry*t 

x3 

x3 

2/3 

1 

0 

0 

0 

ry*  ry*^ 

X3X3 

“2/3*3 

a20 

w 

\  0 

0 

0 

x3 

2/3 

1 

X3IJ3 

“2/32/3  / 

\a2l/ 

(21.37) 
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or 

b  =  M  a.  (21.38) 

Note  that  all  elements  of  the  vector  b  =  (x'0, . . . ,  y'3)J  and  the  matrix 
M  are  obtained  from  the  specified  point  coordinates  and  are  thus  con¬ 
stants.  The  unknown  parameters  a  =  (a00, . . . ,  a2i)T  can  be  found  by 
solving  the  system  of  linear  equations  in  Eqn.  (21.38)  with  standard 
numerical  methods,  for  example,  the  Gauss  algorithm  [35,  p.  276].  It 
is  recommended  to  use  proven  numerical  software  for  this  purpose.2 

If  we  want  to  use  more  than  four  corresponding  point  pairs  to  re¬ 
cover  the  eight  parameters  of  a  projective  transformation,  the  system 
of  linear  equations  in  Eqn.  (21.37)  becomes  overdetermined,  that  is, 
the  system  has  more  equations  than  unknowns.  In  general,  n  pairs  of 
corresponding  points  yield  a  stack  of  2 n  equations,  so  the  vector  b  in 
Eqn.  (21.37)  has  the  length  2 n  and  the  matrix  M  is  of  size  2nx8  (vec¬ 
tor  a  remains  the  same).  Overdetermined  systems  like  this  can  be 
solved  in  a  least-squares  sense  (minimizing  ||M-a  — b||),  for  example, 
using  the  singular- value  (SVD)  or  QR  decomposition  of  M  [96, 145]. 3 
Other  solutions  for  the  multi-point  case  are  discussed  later  in  this 
section  (see  p.  524). 

Inverse  projective  mapping 

In  general,  any  linear  transformation  of  the  form  xr  =  A  -  x  (in 
homogeneous  coordinates  ay  x')  can  be  inverted  by  applying  the 
inverse  of  the  matrix  A,  that  is, 


(21.39) 


provided  that  A  is  nonsingular  (det(A)  ^  0).  The  inverse  of  a  3  x 
3  matrix  A  is  comparatively  easy  to  find  in  closed  form  using  the 
relation 


A 


-l 


1 

det(A) 


•  adj(A), 


(21.40) 


with  the  determinant  det(A)  and  the  adjugate  matrix  adj(A)  (see, 
e.g.,  [35,  pp.  251,  260],  [145,  p.  219]).  In  particular,  for  a  real- valued 
3x3  matrix 


/ a00  a0 1  a02\ 

A  =  I  a io  an  a12  I  , 

\a20  a21  a22/ 

the  determinant  can  be  calculated  as 

det(A)  =  Oqq  an  a22  4~  aoi  ai2  a2o  4"  a02  aio  a2i 
—  a0 o  a12  a2i  —  a0i  a10  a22  —  a02  an  a20, 

and  the  3x3  adjugate  matrix  is 


(21.41) 


(21.42) 


adj(A) 


(alla22~a12a21  a02a21~a01a22  a01a12 —  a02all\ 

a12a20— a10a22  a00a22  — a02a20  a02a10  —  a00a12  I-  (21.43) 

a10a21~alla20  a01a20~a00a21  a00all  ~ a01a10/ 


2  See  Sec.  B.7.1  in  the  Appendix. 

Q  _ 

See  Sec.  B.7.2  in  the  Appendix. 
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21  Geometric  the  sPe(dal  case  of  a  projective  mapping,  the  coefficient  a2 2  =  1 
Operations  (Eqn.  (21.32)),  which  slightly  simplifies  the  calculation. 

Since  scalar  multiples  of  homogeneous  vectors  are  all  equivalent  in 
Cartesian  space  (see  Eqn.  (21.18)),  the  multiplication  by  the  constant 
factor  1/  det(A)  in  Eqn.  (21.40)  can  be  omitted.  Thus,  to  invert  a 
linear  2D  transformation  specified  by  a  3  x  3  matrix  A,  we  only  need 
to  multiply  the  homogeneous  coordinate  vector  with  the  adjugate 
matrix  adj(A),  that  is, 


x  =  A  1  •  x'  =  adj(A)  •  aA  (21.44) 

Returning  to  Cartesian  coordinates,  the  inverse  transformation  can 
be  written  as 


x  =  horn  [adj(A)  •  hom(a/) 


(21.45) 


This  method  can  be  used  to  invert  any  linear  transformation  in  2D, 
including  the  affine  and  projective  mapping  functions  described  al¬ 
ready.  Consequently,  the  inversion  of  the  affine  transformation  shown 
earlier  (see  Eqn.  (21.29))  is  only  a  special  case  of  this  general  method. 

Of  course,  matrix  inversion  may  also  be  implemented  with  stan¬ 
dard  linear  algebra  software,  which  is  not  only  less  error-prone  but 
also  offers  better  numerical  stability  (see  also  Sec.  B.l  in  the  Ap¬ 
pendix)  . 


Projective  mapping  via  the  unit  square 

An  alternative  method  for  finding  the  projective  mapping  parame¬ 
ters  for  a  given  set  of  image  points  is  to  use  a  two-stage  mapping 
through  the  unit  square  S1 ,  which  avoids  iteratively  solving  a  system 
of  equations  [256,  p.  55]  [105].  The  projective  mapping,  shown  in  Fig. 
21.4,  from  the  four  corner  points  of  the  unit  square  S1  to  an  arbitrary 
quadrilateral  Q  =  (x'0, . . . ,  x'3)  with 

,  0)  I— >"  Xq,  (1,  1)  h-t  x2, 

(1,0)  ^  #1,  (0,1)  ^#3, 

reduces  the  system  of  equations  in  Eqn.  (21.37)  to 


(21.46) 


Fig.  21.4 

Projective  mapping  from  the 
unit  square  S1  to  an  arbitrary 
quadrilateral  Q  =  (x'0,  .  .  .  ,  x'3). 
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4  =  a02? 

2/0  =  a12? 

=  a00  +  a02  —  a20  '  5 

Vi  =  aio  +  ai2  —  a20  '  2/i  5  (21.47) 

x2  =  a00  4"  a01  4"  a02  —  a20  '  x2  ~  a21  '  x2i 
y'2  =  a10  4"  all  4-  CL\2  —  a23  '  y'2  ~  a21  ■  2/2? 

x3  =  a01  4"  a02  —  a21  ' 

y'3  =  all  4"  al2  ~  a21  '  2/3* 


This  set  of  equations  has  the  following  closed-form  solution  for  the 
eight  unknown  transformation  parameters  a00 ,  a01,  . . . ,  a2\. 


a20  — 


x,1+x'2-x'3)-(y,3-y,2)  -  Oo-4+4~4H4 

(4  ~x'2)  ■  (y's -y'2)  -  (x3~x2)  ■  (yi-y'2) 


(21.48) 


a21 


(■ yo-y'i+y'2-y3)-(xi-x2 )  -  (xo~ x'i +4 -A)- (4 -y'2) 


(A -4)  ■  (y'3- y'2)  -  04-4)  •  (2/1 -2/2) 


(21.49) 


and 


«oo  -  x'i~xo^a20x'1,  agi  —  T3  —  To  +  a2i  T3,  ^02  —  xoi  (21.50) 

aio  =  y'i  ~ 2/o4~a2o  2/i?  an  =  2/3  2/o 4"a2i  2/3?  ai2  =  y'o-  (21.51) 

By  calculating  the  inverse  of  the  corresponding  3x3  transformation 
matrix  (Eqn.  (21.40)),  the  mapping  may  be  reversed  to  transform  an 
arbitrary  quadrilateral  to  the  unit  square.  A  mapping  T  between  two 
arbitrary  quadrilaterals, 

Q  Q', 

can  thus  be  implemented  by  combining  a  reversed  mapping  and  a 
forward  mapping  via  the  unit  square.  As  illustrated  in  Fig.  21.5,  the 
transformation  of  an  arbitrary  quadrilateral  Q  =  (x0,  x1,  x2,  x3)  to 
a  second  quadrilateral  Q'  =  (x'0,  x'2,  x'3)  is  accomplished  in  two 

steps  involving  the  linear  transformations  Tx  and  T2  between  the  two 
quadrilaterals  and  the  unit  square  Sll  that  is, 


Q  ^  Sx 


(21.52) 


The  parameters  for  the  projective  transformations  Tx  and  T2  are  ob¬ 
tained  by  inserting  the  corresponding  point  coordinates  of  Q  and  Q' 
{xi  and  xb  respectively)  into  Eqns.  (21.48)-(21.51).  The  complete 
transformation  T  is  then  the  concatenation  of  the  two  transforma¬ 
tions  Tx_1  and  T2,  that  is, 

xf  =  T(x)  =  T2(T~\x)),  (21.53) 

or,  expressed  in  matrix  notation  (using  homogeneous  coordinates), 

x  —  A-x  =  A2-A^1-x.  (21.54) 

Of  course,  the  matrix  A  =  A2  •  A^1  needs  to  be  calculated  only  once 
for  a  particular  transformation  and  can  then  be  used  repeatedly  for 
mapping  any  other  image  points  xi. 
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Fig.  21.5 

Two-step  projective  trans¬ 
formation  between  arbitrary 
quadrilaterals.  In  the  first 
step,  quadrilateral  Q  is  trans¬ 
formed  to  the  unit  square  dq 
by  the  inverse  mapping  func¬ 
tion  T~  .  In  the  second  step, 
T2  transforms  the  square  S1 
to  the  target  quadrilateral  Q' . 
The  complete  mapping  T  re¬ 
sults  from  the  concatenation 
of  the  mappings  T~x  and  T2. 
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Example 

The  source  and  the  target  quadrilaterals  Q  and  Q\  respectively,  are 
specified  by  the  following  coordinate  points: 

Q  :  *o  =  (2,5),  x1  =  (4, 6),  x2  =  (7,9),  *3  =  (5,9); 
Q':  x'0  =  (4,3),  *i  =  (5,2),  a£  =  (9,3),  x'3  =  (7,5). 

Using  Eqns.  (21.48) — (21.51) ,  the  transformation  parameters  (matri¬ 
ces)  for  the  projective  mappings  from  the  unit  S1  square  to  the 
quadrilaterals  A1 :  S1  Q  and  A2 :  ^  4  Q7  are  obtained  as 

/  3.33  0.50  2.00  \  /  1.00  -0.50  4.00  \ 

A1  =  3.00  -0.50  5.00  and  A2=  -1.00  -0.50  3.00  . 

\0.33  -0.50  1.00/  \  0.00  -0.50  1.00/ 


Concatenating  the  inverse  mapping  A1  1  with  A2  (by  matrix  multi¬ 
plication),  we  get  the  complete  mapping  A  =  A2-A/x  with 


0.60  -0.45  1.05  \  /  -0.80  1.35  -1.15  \ 

-0.40  0.80  -3.20  and  A=  -1.60  1.70  -2.30  . 

-0.40  0.55  -0.95  /  \  -0.20  0.15  0.65  / 


The  library  method  makeMappingO  in  class  Project iveMapping  (see 
Sec.  21.3)  is  an  implementation  of  this  two-step  technique. 


Projective  transformation  parameters  from  more  than  four 
point  pairs 

The  projective  transformation  in  Eqn.  (21.32)  describes  a  mapping 
between  pairs  of  arbitrary  quadrilaterals  in  the  2D  plane.  This  geo¬ 
metric  relation  is  also  known  under  the  terms  projective  isomorphism 
or  homography.  The  concept  is  frequently  encountered  in  computer 
vision,  because  the  transformations  between  two  views  of  a  planar  3D 
point  set  can  be  modeled  as  a  homography  (with  only  8  degrees  of 
freedom)  in  the  2D  image  plane,  which  is  important,  for  example,  for 
camera  calibration,  and  3D  surface  reconstruction.  In  this  context, 
it  is  often  necessary  to  estimate  the  homography  parameters  from 
a  larger  set  of  2D  point  matches,  for  example,  from  multiple  points 


assumed  to  be  located  on  a  planar  3D  surface.  This  is  the  same 
problem  as  finding  the  projective  mapping  between  sets  of  n  >  4 
corresponding  point  pairs  in  2D. 

Several  approaches  to  “homography  estimation”  exist,  including 
linear  and  (iterative)  nonlinear  methods.  The  simplest  and  most 
common  is  the  direct  linear  transform  (DLT)  method  [56,103],  which 
requires  solving  a  system  of  2 n  homogenous  linear  equations,  typi¬ 
cally  done  by  singular  value  decomposition  (SVD). 


21.1.5  Bilinear  Mapping 


Similar  to  the  projective  transformation  (Eqn.  (21.32)),  the  bilinear 
mapping  function 


Tx:  x'  =  a0  •  x  +  ax  •  y  +  a2  •  x  •  y  +  a3, 
Ty  :  y1  =  b0  ■  x  +  b1  ■  y  +  b2  ■  x  ■  y  +  b3, 


(21.55) 


is  specified  with  four  pairs  of  corresponding  points  and  has  eight 
parameters  (a0, . . . ,  a3,  b0, . . . ,  b3).  The  transformation  is  nonlinear 
because  of  the  mixed  term  x-y  and  cannot  be  described  by  a  linear 
transformation,  even  with  homogeneous  coordinates.  In  contrast  to 
the  projective  transformation,  the  straight  lines  are  not  preserved 
in  general  but  map  onto  quadratic  curves.  Similarly,  circles  are  not 
mapped  to  ellipses  by  a  bilinear  transform. 

A  bilinear  mapping  is  uniquely  specified  by  four  corresponding 
pairs  of  2D  points  (x0,Xq),  . . . ,  (x3,x'3).  In  the  general  case,  for  a 
bilinear  mapping  between  arbitrary  quadrilaterals,  the  coefficients 
a0, . . . ,  a3,  b0, . . . ,  b3  (Eqn.  (21.55))  are  found  as  the  solution  of  two 
separate  systems  of  equations,  each  with  four  unknowns: 
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or  x  =  M  •  a,  (21.56) 


or  y  =  M  •  b.  (21.57) 


These  equations  can  again  be  solved  using  standard  numerical  meth¬ 
ods.  In  the  special  case  of  bilinearly  mapping  the  unit  square  <5q  to  an 
arbitrary  quadrilateral  Q  =  (x'0, . . . ,  x3),  the  parameters  a0, ...  ,a3 


and  b0, . . . ,  b3  are  found  as 

a0  =  x'i  -  x'0,  b0  =  y[-  y'0,  (21.58) 

a\  =  x'z  -  x'0,  b1=y,3-  2/0,  (21.59) 

a2  =  x'0  -  x[  +  x'2  -  x3,  b2  =  y'0  -  y[  +  y'2  -  y'3,  (21.60) 

a3  =  x'0,  b3  =  2/o-  (21.61) 


Figure  21.6  shows  results  of  the  affine,  projective,  and  bilinear 
transformations  applied  to  a  simple  test  pattern.  The  affine  transfor¬ 
mation  (Fig.  21.6(b))  is  specified  by  mapping  to  the  triangle  1-2-3, 
while  the  four  points  of  the  quadrilateral  1-2-3-4  define  the  projective 
and  the  bilinear  transforms  (Fig.  21.6(c,d)). 


21.1  2D  Coordinate 
Transformations 
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Fig.  21.6 

Geometric  transformations 
compared:  original  im¬ 
age  (a),  affine  transforma¬ 
tion  with  respect  to  the  tri¬ 
angle  1-2-3  (b),  projective 
transformation  (c),  and  bi¬ 
linear  transformation  (d). 


(c) 


(d) 


21.1.6  Other  Nonlinear  Image  Transformations 

The  bilinear  transformation  discussed  in  the  previous  section  is  only 
one  example  of  a  nonlinear  mapping  in  2D  that  cannot  be  expressed 
as  a  simple  matrix- vector  multiplication  in  homogeneous  coordinates. 
Many  other  types  of  nonlinear  deformations  exist;  for  example,  to 
implement  various  artistic  effects  for  creative  imaging.  This  type  of 
image  deformation  is  often  called  “image  warping”.  Depending  on 
the  type  of  transformation  used,  the  derivation  of  the  inverse  trans¬ 
formation  function — which  is  required  for  the  practical  computation 
of  the  mapping  using  target-to- source  mapping  (see  Sec.  21.2.2) — is 
not  always  easy  or  may  even  be  impossible.  In  the  following  three 
examples,  we  therefore  look  straight  at  the  inverse  maps 

x  =  T~1{x')  (21.62) 

without  really  bothering  about  the  corresponding  forward  transfor¬ 
mations. 


“Twirl”  transformation 

The  twirl  mapping  causes  the  image  to  be  rotated  around  a  given 
anchor  point  xc  =  (xc,  yc)  with  a  space-variant  rotation  angle,  which 
has  a  fixed  value  a  at  the  center  xc  and  decreases  linearly  with  the 
radial  distance  from  the  center.  The  image  remains  unchanged  out¬ 
side  the  limiting  radius  rmax.  The  associated  ( inverse )  mapping  is 
defined  as 
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Fig.  21.7 

Various  nonlinear  image  de¬ 
formations:  twirl  (a,  d),  ripple 
(b,e),  and  sphere  (c,  f)  trans¬ 
formations.  The  size  of  the 
original  images  is  400  X  400 
pixels. 


with 


xc  T  T  •  cos (/?) 


X 


for  r  <  r 
for  r  >  r 


max  i 

max  ? 


yc  +  r  ■  sin (/?) 


V 


for  r  <  r 
for  r  >  r 


max  5 

max  5 


(21.63) 

(21.64) 


r  = 


3  =  ArcTan(<4,,  dv)  +  a  •  ( r ) , 

U  \  ^max 


dx  —  x  —  xc,  (21.65) 

dy  =  y'-yc •  (21.66) 


Figure  21.7(a,  d)  shows  a  twirl  mapping  with  the  anchor  point  xc 
placed  at  the  image  center.  The  limiting  radius  rmax  is  half  the 
length  of  the  image  diagonal,  and  the  rotation  angle  is  a  =  43°  at 
the  center. 


“Ripple”  transformation 

The  ripple  transformation  causes  a  local  wavelike  displacement  of 
the  image  along  both  the  x  and  y  directions.  The  parameters  of  this 
mapping  function  are  the  period  lengths  rx,ry  ^  0  (in  pixels)  and 
the  corresponding  amplitude  values  axl  ay  for  the  displacement  in 
both  directions: 

T"1 :  x  =  x  +  ax  •  sin(27r'y  ) ,  (21.67) 

Tpl  ■■  y  =  y' +  ay  ■  .  (21.68) 

An  example  for  the  ripple  mapping  with  rx  =  120,  ry  =  250,  ax  =  10, 
and  ay  =  15  is  shown  in  Fig.  21.7(b,e). 

Spherical  transformation 

The  spherical  deformation  imitates  the  effect  of  viewing  the  image 
through  a  transparent  hemisphere  or  lens  placed  on  top  of  the  image. 
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The  parameters  of  this  transformation  are  the  position  xc  =  (xc,yc) 
of  the  lens  center,  the  radius  of  the  lens  rmax  and  its  refraction  index 
p.  The  corresponding  mapping  functions  are  defined  as 


with 


rji —  1  / 

I  •  ry*  -  ry* 

-L  ry*  •  cAy 

Tp:  y  =  y' 


z  •  tan (px) 

0 


for  r  <  r 
for  r  >  r 


max? 

max? 


z  •  tan (/3y) 

0 


for  r  <  r 
for  r  >  r 


max  ? 

max  ? 


(21.69) 

(21.70) 


z  = 


2 

max 


(i 

(i 


•  sin 

•  sin 


( 

( 


_ j_x _ 

\/  (dx+z2) 
dy 

v  (di+z2) 


). 

). 


dx  =  x  —  xc, 

dy  =  y'-yc • 

(21.71) 


Figure  21.7(c,f)  shows  a  spherical  transformation  with  the  lens  posi¬ 
tioned  at  the  image  center.  The  lens  radius  rmax  is  set  to  half  of  the 
image  width,  and  the  refraction  index  is  p  =  1.8. 

See  Exercise  21.4  for  additional  examples  of  nonlinear  geometric 
tarnsformations. 


21.1.7  Piecewise  Image  Transformations 

All  the  geometric  transformations  discussed  so  far  are  global  (i.e.,  the 
same  mapping  function  is  applied  to  all  pixels  in  the  given  image).  It 
is  often  necessary  to  deform  an  image  such  that  a  larger  number  of 
n  original  image  points  x0, . . . ,  xn  are  precisely  mapped  onto  a  given 
set  of  target  points  ccq,  •  •  • ,  x'n.  For  n  =  3,  this  problem  can  be  solved 
with  an  affine  mapping  (see  Sec.  21.1.3),  and  for  n  =  4  we  could  use  a 
projective  or  bilinear  mapping  (see  Secs.  21.1.4  and  21.1.5).  A  precise 
global  mapping  of  n  >  4  points  requires  a  more  complicated  function 
T(x)  (e.g.,  a  2D  nth-order  polynomial  or  a  spline  function). 

An  alternative  is  to  use  local  or  piecewise  transformations,  where 
the  image  is  partitioned  into  disjoint  patches  that  are  transformed 
separately,  applying  an  individual  mapping  function  to  each  patch.  In 
practice,  it  is  common  to  partition  the  image  into  a  mesh  of  triangles 
or  quadrilaterals,  as  illustrated  in  Fig.  21.8. 

For  a  triangular  mesh  partitioning  (Fig.  21.8(a)),  the  transforma¬ 
tion  between  each  pair  of  triangles  Vi  — ^  V[  could  be  accomplished 
with  an  affine  mapping,  whose  parameters  must  be  computed  in¬ 
dividually  for  every  patch.  Similarly,  the  projective  transformation 
would  be  suitable  for  mapping  each  patch  in  a  mesh  partitioning  com¬ 
posed  of  quadrilaterals  Qi  (Fig.  2F8(b)).  Since  both  the  affine  and 
the  projective  transformations  preserve  the  straightness  of  lines,  we 
can  be  certain  that  no  holes  or  overlaps  will  arise  and  the  deformation 
will  appear  continuous  between  adjacent  mesh  patches. 

Local  transformations  of  this  type  are  frequently  used;  for  exam¬ 
ple,  to  register  aerial  and  satellite  images  or  to  undistort  images  for 
panoramic  stitching.  In  computer  graphics,  similar  techniques  are 
used  to  map  texture  images  onto  polygonal  3D  surfaces  in  the  ren¬ 
dered  2D  image.  Another  popular  application  of  this  technique  is 
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21.2  Resampling  the 
Image 


Fig.  21.8 

Mesh  partitioning  examples. 
Almost  arbitrary  image  defor¬ 
mations  can  be  implemented 
by  partitioning  the  image 
plane  into  nonoverlapping  tri¬ 
angles  T>i,T>'i  (a)  or  quadrilat¬ 
erals  Q{,  Q[  (b)  and  applying 
simple  local  transformations. 
Every  patch  in  the  resulting 
mesh  is  transformed  separately 
with  the  required  transforma¬ 
tion  parameters  derived  from 
the  corresponding  three  or  four 
corner  points,  respectively. 


Qi  Qi 


“morphing”  [256],  which  performs  a  stepwise  geometric  transforma¬ 
tion  from  one  image  to  another  while  simultaneously  blending  their 
intensity  (or  color)  values.4 


21.2  Resampling  the  Image 

In  the  discussion  of  geometric  transformations,  we  have  so  far  consid¬ 
ered  the  2D  image  coordinates  as  being  continuous  (i.e.,  real- valued) . 
In  reality,  the  picture  elements  in  digital  images  reside  at  discrete 
(i.e.,  integer- valued)  coordinates,  and  thus  transferring  a  discrete  im¬ 
age  into  another  discrete  image  without  introducing  significant  losses 
in  quality  is  a  nontrivial  subproblem  in  the  implementation  of  geo¬ 
metric  transformations. 

Based  on  the  original  image  I(u,v)  and  some  (continuous)  geo¬ 
metric  transformations  T(x,  y),  the  aim  is  to  create  a  transformed 
image  I'(ur,vr)  where  all  coordinates  are  discrete  (i.e.,  u,  v  E  Z  and 


4  Image  morphing  has  also  been  implemented  in  ImageJ  as  a  plugin 
( iMorph )  by  Hajime  Hirase  (http://rsb.info.nih.gov/ij/plugins/morph.html). 
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Fig.  21.9 

Source-to-target  mapping.  For 
each  discrete  pixel  position 
(u,  v)  in  the  source  image  I, 
the  corresponding  (continuous) 
target  position  ( x' ,  y ')  is  found 
by  applying  the  geometric 
transformation  T(u,  v).  In 
general,  the  target  position 
( x',y ')  does  not  coincide  with 
any  discrete  raster  point.  The 
source  pixel  value  /(u,u)  is 
subsequently  transferred  to  one 
of  the  adjacent  target  pixels. 
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u',v'  G  Z).5  This  can  be  accomplished  in  one  of  two  ways,  which 
differ  by  the  mapping  direction  and  are  commonly  referred  to  as 
source-to-target  or  target-to- source  mapping,  respectively. 

21.2.1  Source-to- Target  Mapping 

In  this  approach,  which  appears  quite  natural  at  first  sight,  we  com¬ 
pute  for  every  pixel  (r,  v)  of  the  original  ( source )  image  I  the  corre¬ 
sponding  transformed  position 

(x',y')=T(u,v)  (21.72) 

in  the  target  image  V .  In  general,  the  result  will  not  coincide  with 
any  of  the  raster  points,  as  illustrated  in  Fig.  21.9.  Subsequently, 
we  would  have  to  decide  in  which  pixel  in  the  target  image  Ir  the 
original  intensity  or  color  value  from  I(u,v)  should  be  stored.  We 
could  perhaps  even  think  of  somehow  distributing  this  value  onto  all 
adjacent  pixels. 


Source  image  I  Target  image  I' 


The  problem  with  the  source-to-target  method  is  that,  depend¬ 
ing  on  the  geometric  transformation  T,  some  elements  in  the  target 
image  Ir  may  never  be  “hit”  at  all  (i.e.,  never  receive  a  source  pixel 
value)!  This  happens,  for  example,  when  the  image  is  enlarged  (even 
slightly)  by  the  geometric  transformation.  The  resulting  holes  in  the 
target  image  would  be  difficult  to  close  in  a  subsequent  processing 
step.  Conversely,  one  would  have  to  consider  (e.g.,  when  the  image 
is  shrunk)  that  a  single  element  in  the  target  image  /'  may  be  hit 
by  multiple  source  pixels  and  thus  image  content  may  get  lost.  In 
the  light  of  all  these  complications,  source-to-target  mapping  is  not 
really  the  method  of  choice. 

21.2.2  Target-to-Source  Mapping 

This  method  avoids  most  difficulties  encountered  in  the  source-to- 
target  mapping  by  simply  reversing  the  image  generation  process.  For 
every  discrete  pixel  position  (id,  vr )  in  the  target  image,  we  determine 
the  corresponding  (continuous)  point 

5  Remark  on  notation:  We  mostly  use  (u,v)  or  (■ u\v ')  to  denote  discrete 
(integer)  coordinates  and  (x,y)  or  (, x',y ')  for  continuous  (real- valued) 
coordinates. 


Source  image  / 


Target  image  I' 


21.3  Java 
Implementation 


y 


(x,  y)=T  1(u',v') 


(21.73) 


Fig.  21.10 

Target-to-source  mapping.  For 
each  discrete  pixel  position 
(u',v')  in  the  target  image  I', 
the  corresponding  continuous 
source  position  (x,y)  is  found 
by  applying  the  inverse  map¬ 
ping  function  T_1  (ur ,  v').  The 
new  pixel  value  l'(u',v')  is  de¬ 
termined  by  interpolating  the 
pixel  values  in  the  source  im¬ 
age  within  some  neighborhood 
of  0,  y). 


in  the  source  image  plane  using  the  inverse  geometric  transformation 
T-1.  Of  course,  the  coordinate  (x,y)  again  does  not  fall  onto  a 
raster  point  in  general  (Fig.  21.10)  and  thus  we  have  to  decide  from 
which  of  the  neighboring  source  pixels  to  extract  the  resulting  target 
pixel  value.  This  problem  of  interpolating  among  intensity  values  is 
discussed  in  detail  in  Chapter  22. 

The  major  advantage  of  the  target-to-source  method  is  that  all 
pixels  in  the  target  image  I'  (and  only  these)  are  computed  and  filled 
exactly  once  such  that  no  holes  or  multiple  hits  can  occur.  This, 
however,  requires  the  inverse  geometric  transformation  T-1  to  be 
available,  which  is  no  disadvantage  in  most  cases  since  the  forward 
transformation  T  itself  is  never  really  needed.  Due  to  its  simplicity, 
which  is  also  demonstrated  in  Alg.  21.1,  target-to-source  mapping  is 
the  common  method  for  geometrically  transforming  2D  images. 


1:  Transformlmage  (/,  T) 

r\  r\ 

Input:  /,  source  image;  T,  continuous  mapping  1R  ^  M  . 
Returns  the  transformed  image. 

2:  (M,  N)  V-  Size(7) 

3:  I'  V-  duplicate^)  >  create  the  target  image 

4:  for  all  (u,v)  £  M  x  N  do  >  loop  over  all  target  pixels 

5:  (x,y)  <-  T_1(u,  v) 

6:  I'(u,v)  <—  GetlnterpolatedValue(7,  x,  y) 

7:  return  I' 


Alg.  21.1 

Geometric  image  trans¬ 
formation  using  target-to- 
source  mapping.  Given  are 
the  original  (source)  image 
I  and  the  continuous  coor¬ 
dinate  transformation  T. 
GetlnterpolatedValue(7,  x,  y) 
returns  the  interpolated  value 
of  the  source  image  I  at  the 
continuous  position  (x,y). 


21.3  Java  Implementation 

In  plain  Image J,  only  a  few  simple  geometric  operations  are  provided 
as  methods  for  the  ImageProcessor  class,  such  as  rotation  and  flip¬ 
ping.6  This  section  describes  the  implementation  of  the  transforma¬ 
tions  described  in  this  chapter,  which  is  openly  available  as  part  of 
the  imagingbook  library.7 

6  Additional  operations,  including  affine  transformations,  are  available  as 
plugin  classes  as  part  of  the  optional  TransformJ  package  [162]. 

1-7 

Package  imagingbook . pub . geometry . mappings . 
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21.3.1  General  Mappings  (Class  Mapping) 

The  abstract  class  Mapping  is  the  superclass  for  all  subsequent  trans¬ 
formations.  All  subclasses  of  Mapping  are  required  to  implement 
the  method  applyTo  (double  []  pnt),  which  applies  the  associated 
transformation  to  a  given  coordinate  point  and  returns  the  trans¬ 
formed  point.  The  actual  transformations  are  implemented  by  its 
concrete  sub-classes.  The  applyTo  ()  method  is  defined  in  multiple 
versions  with  different  signatures: 
doublet]  applyTo  (double  []  pnt) 

Applies  this  transformation  to  the  2D  point  (of  type  double  [] ) 
and  returns  the  transformed  coordinate. 

Point2D  applyTo  (Point2D  pnt) 

Applies  this  transformation  to  the  2D  point  (of  type  Point2D) 
and  returns  the  transformed  coordinate. 

Point2D[]  applyTo  (Point2D  []  pnts) 

Applies  this  transformation  to  a  sequence  of  the  2D  points  (of 
type  Point  2D)  and  returns  a  sequence  of  transformed  coordi¬ 
nates. 

In  addition,  the  Mapping  class  can  also  be  used  to  transform  entire 
images: 

doublet]  applyTo  (ImageProcessor  source,  ImageProcessor 
target,  Pixellnterpolator .Method  im) 

Transforms  the  input  image  source  onto  the  output  im¬ 
age  target  by  target-to-source  mapping,  using  the  pixel 
interpolation  method  im. 

doublet]  applyTo  (ImageProcessor  ip, 

Pixellnterpolator .Method  im) 

Transforms  the  input  image  ip  destructively,  using  the  pixel 
interpolation  method  im. 

doublet]  applyTo  (Imagelnterpolator  source, 
ImageProcessor  target) 

Transforms  the  input  image  (specified  by  the  interpolator 
source)  onto  the  output  image  target  by  target-to-source 
mapping. 

Other  methods  defined  by  class  Mapping: 

Mapping  duplicate  () 

Returns  a  copy  of  this  mapping. 

Mapping  get Inverse  () 

Returns  the  inverse  of  this  mapping  if  available.  Otherwise  an 
UnsupportedOperationException  is  thrown. 


21.3.2  Linear  Mappings 

Linear  transformations  are  implemented  by  class  LinearMapping,8 
with  sub-classes  including 

Af f ineMapping,  Scaling, 

ProjectiveMapping,  Shear, 

Rotation,  Translation. 


Package  imagingbook . pub . geometry . mappings . linear. 
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21.3.3  Nonlinear  Mappings 


21.3  Java 
Implementation 


Selected  nonlinear  transformations  are  implemented  by  the  following 
subclasses  of  Mapping:9 

BilinearMapping,  ShereMapping, 

RippleMapping,  TwirlMapping. 


21.3.4  Sample  Applications 

The  following  two  ImageJ  plugins  show  two  simple  examples  of  the 
use  of  the  classes  in  Secs.  21.3.2  and  21.3.3  for  implementing  geomet¬ 
ric  operations  and  pixel  interpolation  (see  Ch.  22  for  details).  Note 
that  these  plugins  can  be  applied  to  any  type  of  image. 

Example  1:  image  rotation 

The  example  in  Prog.  21.1  shows  a  plugin  (Transf  orm_Rotate)  to 
rotate  an  image  by  15°.  First  (in  line  16)  the  geometric  mapping 
object  (map)  is  created  as  an  instance  of  class  Rotation,  with  the 
supplied  angle  being  converted  from  degrees  to  radians.  The  actual 
transformation  of  the  image  is  performed  by  invoking  the  method 
applyToQ  in  line  17. 


1  import  i j . ImagePlus ; 

2  import  ij .plugin. filter . PluglnFilter ; 

3  import  ij .process . ImageProcessor ; 

4  import  imagingbook .pub .geometry . interpolators .pixel . 

Pixellnterpolator ; 

5  import  imagingbook . pub . geometry . mappings . Mapping ; 

6  import  imagingbook .pub .geometry .mappings .linear .Rotation; 

7 

8  public  class  Transf orm_Rot ate  implements  PluglnFilter  { 

9  static  double  angle  =  15;  //  rotation  angle  (in  degrees) 

10 

11  public  int  setup (String  arg,  ImagePlus  imp)  { 

12  return  D0ES_ALL ; 

13  } 

14 

15  public  void  run (ImageProcessor  ip)  { 

16  Mapping  map  =  new  Rotation ( (2*Math . PI*angle) /360) ; 

17  map . applyTo (ip,  Pixellnterpolator .Method. Bicubic) ; 

18  } 

19  } 


Prog.  21.1 

Image  rotation  example  using 
the  Rotation  class  (ImageJ 
plugin). 


Example  2:  projective  transformation 

The  second  example  in  Prog.  21.2  illustrates  the  implementation  of 
a  projective  transformation.  The  geometric  mapping  T  is  defined  by 
two  corresponding  quadrilaterals  P  =  pO,  . . . ,  p3  and  Q  =  qO,  . . . ,  q3, 
respectively.  In  a  real  application,  these  points  would  probably  be 
specified  interactively  or  given  as  the  result  of  a  mesh  partitioning. 

Package  imagingbook . pub . geometry . mappings . nonlinear. 
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Prog.  21.2 

Projective  image  trans¬ 
formation  example  us¬ 
ing  the  Project iveMapping 
class  (Image J  plugin). 
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1  import  i j . ImagePlus ; 

2  import  ij . plugin . filter . PluglnFilter ; 

3  import  ij . process . ImageProcessor ; 

4  import  imagingbook . pub . geometry . interpolators .pixel . 

Pixellnterpolator ; 

5  import  imagingbook . pub . geometry .mappings .Mapping; 

6  import  imagingbook . pub . geometry .mappings . linear . 

Project iveMapping ; 

7  import  java. awt .Point ; 

8  import  java. awt .geom.Point2D; 

9 

10  public  class  Transf orm_Projective  implements  PluglnFilter  { 

11 

12  public  int  setup (String  arg,  ImagePlus  imp)  { 

13  return  D0ES_ALL; 

14  } 

15 

16  public  void  run (ImageProcessor  ip)  { 

17  Point2D  pO  =  new  Point (0,  0); 

18  Point2D  pi  =  new  Point (400,  0); 

19  Point2D  p2  =  new  Point (400,  400); 

20  Point2D  p3  =  new  Point (0,  400); 

21 

22  Point2D  qO  =  new  Point (0,  60); 

23  Point2D  ql  =  new  Point (400,  20); 

24  Point2D  q2  =  new  Point (300,  400); 

25  Point2D  q3  =  new  Point (30,  200); 

26 

27  Mapping  map  =  new 

28  Proj ect iveMapping (pO ,  pi,  p2,  p3,  qO,  ql ,  q2,  q3) ; 

29 

30  map . applyTo (ip ,  Pixellnterpolator .Method. Bilinear ) ; 

31  } 

32  } 


The  transformation  object  map  (representing  the  forward  trans¬ 
formation  T)  is  created  by  calling  the  associated  constructor  Pro- 
j  ect  iveMapping  ()  in  line  28.  The  mapping  is  applied  to  the  input 
image  (line  30),  as  in  the  previous  example,  except  for  the  use  of 
bilinear  pixel  interpolation. 


21.4  Exercises 

Exercise  21.1.  Show  that  a  straight  line  y  =  kx-\-d  in  2D  is  mapped 
to  another  straight  line  under  a  projective  transformation,  as  defined 
in  Eqn.  (21.32). 

Exercise  21.2.  Show  that  parallel  lines  remain  parallel  under  affine 
transformation  (Eqn.  (21.20)). 

Exercise  21.3.  Design  a  nonlinear  geometric  transformation  simi¬ 
lar  to  the  ripple  transformation  (Eqn.  (21.67))  that  uses  a  sawtooth 
function  instead  of  a  sinusoid  for  the  distortions  in  the  horizontal 


21.4  Exercises 


(a)  Original  image 


(b)  Radial  wave  (a  =  10.0,  r  =  38) 


£ 


(c)  Clover  (a  =  0.2,  N  =  8) 


(e)  Angular  wave  (a  =  0.1,  r  =  38) 


(d)  Spiral  (a  =  0.01) 


(f)  Tapestry  (a  =  5.0,  rx  =  ry  =  30) 


Fig.  21.11 

Examples  of  the  nonlinear 
geometric  transformations 
defined  in  Exercise  21.4.  The 
reference  point  xc  is  always 
taken  at  the  image  center. 


and  vertical  directions.  Use  the  class  TwirlMapping  as  a  template 
for  your  implementation. 

Exercise  21.4.  Implement  one  or  more  of  the  following  nonlinear 
geometric  transformations  (see  Fig.  21.11): 

A.  Radial  wave  transformation:  This  transformation  simulates  an 
omni-directional  wave  which  originates  from  a  fixed  center  point 
xc  (see  Fig.  21.11(b)).  The  inverse  transformation  (applied  to  a 
target  image  point  x  '  =  is 


T 


X, 


:  x  = 


for  r  =  0, 


xc  +  •  (xr  —  xc)  for  r  >  0, 


(21.74) 


with  r  =  \\xr  —  xc\\  and  S  =  a  •  sin  (27 rr/r).  Parameter  a  specifies 
the  amplitude  (strength)  of  the  distortion  and  r  is  the  period 
(width)  of  the  radial  wave  (in  pixel  units). 

B.  Clover  transformation:  This  transformation  distorts  the  image 
in  the  form  of  a  N- leafed  clover  shape  (see  Fig.  21.11(c)).  The 
associated  inverse  transformation  is  the  same  as  in  Eqn.  (21.74) 
but  uses 
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S  =  a  •  r  •  cos(7V  •  a),  with  a  =  Z(x'  —  xc)  (21.75) 


instead.  Again  r  =  || —  ccc ||  is  the  radius  of  the  target  image 
point  x'  from  the  designated  center  point  xc.  Parameter  a  speci¬ 
fies  the  amplitude  of  the  distortion  and  N  is  the  number  of  radial 
“leaves”. 

C.  Spiral  transformation:  This  transformation  (see  Fig.  21.11(d)) 
is  similar  to  the  twirl  transformation  in  Eqns.  (21.63)^(21.64), 
defined  by  the  inverse  transformation 

T-1 2 3 * * * :  x  =  xc  +  ,  ■  (C“<»)  ,  (21.76) 


with  [3  =  Z(x'  —  xc)+a-r  and  r  =  \\x'  —  xc\\  denoting  the  distance 
from  the  target  point  x'  and  the  center  point  xc.  The  angle  [3 
increases  linearly  with  r;  parameter  a  specifies  the  “velocity”  of 
the  spiral. 


D.  Angular  wave  transformation:  This  is  another  variant  of  the 
twirl  transformation  in  Eqns.  (21.63)-(21.64).  Its  inverse  trans¬ 
formation  is  the  same  as  for  the  spiral  mapping  in  Eqn.  (2E76), 
but  in  this  case 


/?  =  Z[x!  —  xc )  -1-  a  •  sin  .  (21.77) 


Thus  the  angle  /?  is  modified  by  a  sine  function  with  amplitude 
a  (see  Fig.  21.11(e)). 

E.  Tapestry  transformation:  In  this  case  the  inverse  transformation 
of  a  target  point  x'  =  (x7,  y')  is 


T  1 :  x  =  x  +  a  • 


(2E78) 


again  with  the  center  point  xc  =  (xc,yc).  Parameter  a  specifies 
the  distortion’s  amplitude  and  rx ,  ry  are  the  wavelengths  (mea¬ 
sured  in  pixel  units)  along  the  x  and  y  axis,  respectively  (see  Fig. 
21.11(f)). 


Exercise  21.5.  Implement  an  interactive  program  (plugin)  that  per¬ 
forms  projective  rectification  (see  Sec.  21 T. 4)  of  a  selected  quadrilat¬ 
eral,  as  shown  in  Fig.  21.12.  Make  your  program  perform  the  follow¬ 
ing  steps: 

1.  Let  the  user  mark  the  source  quad  in  the  source  image  I  as  a 
polygon-shaped  region  of  interest  (ROI)  with  at  least  four  points 
cc0, . . . ,  x3.  In  Image J  this  is  easily  done  with  the  built-in  polygon 
selection  tool  (see  Prog.  21.3  for  handling  ROI  points). 

2.  Create  an  output  image  Ir  of  fixed  size  (i.e.,  proportional  to  A4 
or  Letter  paper  size). 

3.  The  target  rectangle  is  defined  by  the  four  corners  xfQ, . . . ,  x'3  of 

the  output  image.  The  source  and  target  points  are  associated 

1:1,  that  is,  the  four  corresponding  point  pairs  are  (x0,Xq),  . . . , 

{x3,x'3). 
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4.  From  the  four  point  pairs,  create  an  instance  of  Projective- 
Mapping,  as  demonstrated  in  Prog.  21.2. 

5.  Test  the  obtained  mapping  by  applying  A  to  the  specified  source 
points  cc0, . . . ,  x3.  Make  sure  they  project  exactly  to  the  specified 
target  points  x'0, . . . ,  x'3. 

6.  Apply  the  obtained  mapping  from  the  source  to  the  target  image 
using  the  method10 

void  applyTo (ImageProcessor  source, 

ImageProcessor  target,  InterpolationMethod  im) . 

7.  Show  the  resulting  output  image. 


(a) 


(b) 


Fig.  21.12 

Projective  rectification  exam¬ 
ple  (see  Exercise  21.5).  Source 
image  and  user-defined  selec¬ 
tion  (a);  transformed  output 
image  (b). 


10 


Defined  in  class  imagingbook.  pub.  geometry  .mappings  .Mapping. 
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Prog.  21.3 

ImageJ  plugin  demonstrat¬ 
ing  the  extraction  of  vertex 
points  from  a  user-selected 
polygon-ROI  (region  of  inter¬ 
est).  Notice  that  (in  line  21) 
the  region  of  interest  (ROI) 
is  obtained  from  the  associ¬ 
ated  ImagePlus  instance  (to 
which  a  reference  is  kept  in 
line  16)  and  not  from  the  sup¬ 
plied  ImageProcessor  object. 
Image J’s  ROI  coordinates  are 
integer  positions  in  general. 


1 

import  java. awt .Point ; 

2 

import  java. awt .Polygon; 

3 

A 

import  java. awt . geom . Point 2D ; 

5 

import  i j . ImagePlus ; 

6 

import  i j . gui . PolygonRoi ; 

7 

import  ij.gui.Roi; 

8 

import  ij . plugin .filter . PluglnFilter ; 

9 

import  i j . process . ImageProcessor ; 

10 

11 

public  class  Get_Roi_Points  implements  PluglnFilter  { 

12 

13 

ImagePlus  im  =  null; 

14 

15 

public  int  setup (String  args ,  ImagePlus 

im)  { 

16 

this .  im  =  im;  //  keep  a  reference  to  im 

17 

return  D0ES_ALL  +  R0I_REQUIRED ; 

18 

} 

19 

20 

public  void  run (ImageProcessor  source)  { 

21 

Roi  roi  =  im.getRoiO; 

22 

if  ( ! (roi  instanceof  PolygonRoi))  { 

23 

IJ . error ("Polygon  selection  required 

!") ; 

24 

return; 

25 

} 

26 

27 

Polygon  poly  =  roi .  getPolygonO  ; 

28 

29 

//  copy  polygon  vertices  to  a  point  array: 

30 

Point2D  []  pts  =  new  Point2D [poly .npoints] ; 

31 

for  (int  i  =  0;  i  <  poly .npoints ;  i++) 

{ 

32 

pts[i]  =  new  Point (poly . xpoints  [i] , 

poly . ypoints  [i] ) ; 

33 

} 

34 

35 

...  II  use  the  ROI  points  in  pts 

36 

37 

} 

38 

39 

} 
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Pixel  Interpolation 


Interpolation  is  the  process  of  estimating  the  intermediate  values  of 
a  sampled  function  or  signal  at  continuous  positions  or  the  attempt 
to  reconstruct  the  original  continuous  function  from  a  set  of  discrete 
samples.  In  the  context  of  geometric  operations  this  task  arises  from 
the  fact  that  discrete  pixel  positions  in  one  image  are  generally  not 
mapped  to  discrete  raster  positions  in  the  other  image  under  some 
continuous  geometric  transformation  T  (or  T-1,  respectively).  The 
concrete  goal  is  to  obtain  an  optimal  estimate  for  the  value  of  the 
2D  image  function  I(x,y)  at  any  continuous  position  (x,y)  G  M2  to 
implement  the  function 

GetlnterpolatedValue(/,  ay  y), 

which  we  defined  in  Chapter  21  (see  Alg.  21.1).  Ideally  the  inter¬ 
polated  image  should  preserve  as  much  detail  (i.e.,  sharpness)  as 
possible  without  causing  visible  artifacts  such  as  ringing  or  moire 
patterns. 


22.1  Simple  Interpolation  Methods 

To  illustrate  the  problem,  we  first  attend  to  the  ID  case  (see  Fig. 
22.1).  Several  simple,  ad-hoc  methods  exist  for  interpolating  the 
values  of  a  discrete  function  g(u ),  with  u  G  Z,  at  arbitrary  continuous 
positions  x  E  R.  The  simplest  of  all  interpolation  methods  is  to 
round  the  continuous  coordinate  x  to  the  closest  integer  ux  and  use 
the  associated  sample  g(ux )  as  the  interpolated  value,  that  is, 

g(x)  g(ux),  (22.1) 


with  ux  =  round(o?)  =  [x  +  0.5J .  A  typical  result  of  this  so-called 
nearest-neighbor  interpolation  is  shown  in  Fig.  22.2(a). 

Another  simple  method  is  linear  interpolation.  Here  the  estimated 
value  is  the  sum  of  the  two  closest  samples  g(u0)  and  g(u0  +  1),  with 
uo  =  lx\  •  The  weight  of  each  sample  is  proportional  to  its  closeness 
to  the  continuous  position  ay  that  is, 
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Fig.  22.1 

Interpolating  a  discrete  func¬ 
tion  in  ID.  Given  the  discrete 
function  values  g{u)  (a),  the 
goal  is  to  estimate  the  origi¬ 
nal  function  f(x )  at  arbitrary 
continuous  positions  x  E  R  (b). 


9  O) 


(a) 


/  0) 


Fig.  22.2 

Simple  interpolation  meth¬ 
ods.  The  nearest-neighbor 
interpolation  (a)  simply  se¬ 
lects  the  discrete  sample  g{u) 
closest  to  the  given  contin¬ 
uous  coordinate  x  as  the  in¬ 
terpolating  value  g(x).  Under 
linear  interpolation  (b),  the 
result  is  a  piecewise  linear 
function  connecting  adjacent 
samples  g{u)  and  g{u  +  1). 


9  0) 


9  0) 


g(x)  =  g{ux)  +  (a;  -  ux)  ■  (g(ux  +  1)  -  g(ux)) 

=  g(ux)  ■  (!  -  (x  ~  ux))  +  g(ux  +  1)  •  (x  -  ux). 


(22.2) 


As  shown  in  Fig.  22.2(b),  the  result  is  a  piecewise  linear  function 
made  up  of  straight  line  segments  between  consecutive  sample  values. 


22.1.1  Ideal  Low-Pass  Filter 

Obviously  the  results  of  these  simple  interpolation  methods  do  not 
well  approximate  the  original  continuous  function  (Fig.  22.1).  But 
how  can  we  obtain  a  better  approximation  from  the  discrete  sam¬ 
ples  only  when  the  original  function  is  unknown?  This  may  appear 
hopeless  at  first,  because  the  discrete  samples  g(u)  could  possibly 
originate  from  any  continuous  function  f{pc)  with  identical  values  at 
the  discrete  sample  positions. 

We  find  an  intuitive  answer  to  this  question  (once  again)  by  look¬ 
ing  at  the  functions  in  the  spectral  domain.  If  the  original  function 
f(x)  was  discretized  in  accordance  with  the  sampling  theorem  (see 
Ch.  18,  Sec.  18.2.1),  then  f(x)  must  have  been  “band  limited” — 
it  could  not  contain  any  signal  components  with  frequencies  higher 
than  half  the  sampling  frequency  ujs.  This  means  that  the  recon¬ 
structed  signal  can  only  contain  a  limited  set  of  frequencies  and  thus 
its  trajectory  between  the  discrete  sample  values  is  not  arbitrary  but 
naturally  constrained. 

In  this  context,  absolute  units  of  measure  are  of  no  concern  since 
in  a  digital  signal  all  frequencies  relate  to  the  sampling  frequency.  In 
particular,  if  we  take  rs  =  1  as  the  (unitless)  sampling  interval,  the 
resulting  sampling  frequency  is 

1 

i ds  =  2-7 T-fs  =  2  •  7T =  2  •  7T  (22.3) 

Fs 

and  thus  the  maximum  signal  frequency  is  cjmax  =  =  7r.  To  isolate 

the  frequency  range  — cjmax  . . .  cdmax  in  the  corresponding  (periodic) 


Fourier  spectrum,  we  multiply  the  spectrum  G(uj)  by  a  square  win-  22.1  Simple 

dowing  function  II ^(u)  of  width  ±u;max  =  ±7 r,  Interpolation  Methods 


G(u)  =  G{uo)  •  nn(uj) 


G(v) 


1  for  —7 r  <  uj  <  7T, 
0  otherwise. 


(22.4) 


This  is  called  an  ideal  low-pass  filter ,  which  cuts  off  all  signal  compo¬ 
nents  with  frequencies  greater  than  7 r  and  keeps  all  lower-frequency 
components  unchanged.  In  the  signal  domain,  the  operation  in  Eqn. 
(22.4)  corresponds  (see  Eqn.  (18.27))  to  a  linear  convolution  with  the 
inverse  Fourier  transform  of  the  windowing  function  77^  (cj),  which  is 
the  Sine  function,  defined  as 

sin(7nr) 

7TX 

and  shown  in  Fig.  22.3  (see  also  Ch.  18,  Table  18.1).  This  corre¬ 
spondence,  which  was  already  discussed  in  Chapter  18,  Sec.  18.1.6, 
between  convolution  in  the  signal  domain  and  simple  multiplication 
in  the  frequency  domain  is  summarized  in  Fig.  22.4. 


(22.5) 


Sinc(a:) 


Fig.  22.3 

Sine  function  in  ID.  The  func¬ 
tion  Sinc(a:)  has  the  value  1 
at  the  origin  and  zero  values 
at  all  integer  positions.  The 
dashed  line  plots  the  amplitude 
I  —  I  of  the  underlying  sine 

|  7T  x  I  J  0 

function. 


So  theoretically  Sinc(x)  is  the  ideal  interpolation  function  for  re¬ 
constructing  a  frequency-limited  continuous  signal.  To  compute  the 
interpolated  value  for  the  discrete  function  g(u)  at  an  arbitrary  po¬ 
sition  t0,  the  Sine  function  is  shifted  to  x0  (such  that  its  origin  lies 
at  x0),  multiplied  with  all  sample  values  g(u),  with  u  E  Z,  and  the 
results  are  summed — that  is,  g(u)  and  Sinc(x)  are  convolved.  The 
reconstructed  value  of  the  continuous  function  at  position  x0  is  thus 

oo 

§(x o)  =  [Sine  *  g]  (x0)  =  ^  Sinc(x0  -  u)  ■  g(u),  (22.6) 

u=  —  oo 

where  *  is  the  linear  convolution  operator  (see  Ch.  5,  Sec.  5.3.1).  If 
the  discrete  signal  g(u)  is  finite  with  length  N  (as  is  usually  the  case), 
it  is  assumed  to  be  periodic  (i.e.,  g(u)  =  g(u  +  kN)  for  all  k  E  Z).1 
In  this  case,  Eqn.  (22.6)  modifies  to 

oo 

g(x 0)  =  E  Sinc(x0  —  u)  •  g(u  mod  N) .  (22.7) 

u=  —  oo 


1  This  assumption  is  explained  by  the  fact  that  a  discrete  Fourier  spec¬ 
trum  implicitly  corresponds  to  a  periodic  signal  (also  see  Ch.  18,  Sec. 
18.2.2). 
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Signal 


Spectrum 


Fig.  22.4 

Interpolation  of  a  discrete 
signal — relation  between  sig¬ 
nal  and  frequency  space.  The 
discrete  signal  g{u)  in  sig¬ 
nal  space  (left)  corresponds 
to  the  periodic  Fourier  spec¬ 
trum  G(lj)  in  frequency  space 
(right).  The  spectrum  G(uj) 
of  the  continuous  signal  is  iso¬ 
lated  from  G(lj)  by  point-wise 
multiplication  (x)  with  the 
square  function  nn(uj),  which 
constitutes  an  ideal  low-pass 
filter  (right).  In  signal  space 
(left),  this  operation  corre¬ 
sponds  to  a  linear  convolution 
(*)  with  the  function  Sinc(cc). 


o 


Fig.  22.5 

Interpolation  by  convolving 
with  the  Sine  function.  The 
Sine  function  is  shifted  by 
aligning  its  origin  with  the  in¬ 
terpolation  points  x0  =  4.4  (a) 
and  x0  =  5  (b).  The  values 
of  the  shifted  Sine  function 
(dashed  curve)  at  the  inte¬ 
gral  positions  are  the  weights 
(coefficients)  for  the  corre¬ 
sponding  sample  values  g(u). 
When  the  function  is  interpo¬ 
lated  at  some  integral  position, 
such  as  x0  =5  (b),  only  the 
sample  value  g(x0)  =  g( 5)  is 
considered  and  weighted  with 
1,  while  all  other  samples  co¬ 
incide  with  the  zero  positions 
of  the  Sine  function  and  thus 
do  not  contribute  to  the  result. 
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Sinc(cc  — 4.4) 
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Sinc(cc  —  5) 


l 

0.8 

0.6 

0.4 

0.2 


-0.2 

-0.4 


2  \  /4 


X 


/ 

\  / 


10 


x0  =  5 
(b) 


It  may  be  surprising  that  the  ideal  interpolation  of  a  discrete  function 
g(u)  at  a  position  x0  apparently  involves  not  only  a  few  neighboring 
sample  points  but,  in  general,  infinitely  many  values  of  g(u)  whose 
weights  decrease  continuously  with  their  distance  from  the  given  in¬ 


terpolation  point  x0  (at  the  rate 


).  Figure  22.5  shows  two 


7 t(Xq—u) 

examples  for  interpolating  the  function  g(u)  at  positions  x0  =  4.4 
and  Xq  =  5.  If  the  function  is  interpolated  at  some  integral  position, 
such  as  x0  =  5,  the  sample  g{u)  at  u  =  x0  receives  the  weight  1, 
while  all  other  samples  coincide  with  the  zero  positions  of  the  Sine 
function  and  are  thus  ignored.  Consequently,  the  resulting  interpo¬ 
lation  values  are  identical  to  the  sample  values  g{u)  at  all  discrete 
positions  x  =  u. 

If  a  continuous  signal  is  properly  frequency  limited  (by  half  the 
sampling  frequency  ),  it  can  be  exactly  reconstructed  from  the  dis¬ 
crete  signal  by  interpolation  with  the  Sine  function,  as  Fig.  22.6(a) 
demonstrates.  Problems  occur,  however,  around  local  high-frequency 
signal  events,  such  as  rapid  transitions  or  pulses,  as  shown  in  Fig. 
22.6(b,c).  In  those  situations,  the  Sine  interpolation  causes  strong 
overshooting  or  “ringing”  artifacts,  which  are  perceived  as  visually 
disturbing.  For  practical  applications,  the  Sine  function  is  therefore 
not  suitable  as  an  interpolation  kernel — not  only  because  of  its  infi¬ 
nite  extent  (and  the  resulting  noncomputability). 

A  good  interpolation  function  implements  a  low-pass  filter  that, 
on  the  one  hand,  introduces  minimal  blurring  by  maintaining  the 


0i  0*0 


92  O) 


03  O) 


(a) 


(b) 


x 


maximum  signal  bandwidth  but,  on  the  other  hand,  also  delivers  a 
good  reconstruction  at  rapid  signal  transitions.  In  this  regard,  the 
Sine  function  is  an  extreme  choice — it  implements  an  ideal  low-pass 
filter  and  thus  preserves  a  maximum  bandwidth  and  signal  continu¬ 
ity  but  gives  inferior  results  at  signal  transitions.  At  the  opposite 
extreme,  nearest-neighbor  interpolation  (see  Fig.  22.2)  can  perfectly 
handle  steps  and  pulses  but  generally  fails  to  produce  a  continuous 
signal  reconstruction  between  sample  points.  The  design  of  an  inter¬ 
polation  function  thus  always  involves  a  trade-off,  and  the  quality  of 
the  results  often  depends  on  the  particular  application  and  subjective 
judgment.  In  the  following,  we  discuss  some  common  interpolation 
functions  that  come  close  to  this  goal  and  are  therefore  frequently 
used  in  practice. 


22.2  Interpolation  by 
Convolution 


Fig.  22.6 

Sine  interpolation  applied  to 
various  signal  types.  The  re¬ 
constructed  function  in  (a)  is 
identical  to  the  continuous, 
band-limited  original.  The  re¬ 
sults  for  the  step  function  (b) 
and  the  pulse  function  (c) 
show  the  strong  ringing  caused 
by  Sine  (ideal  low-pass)  inter¬ 
polation. 


22.2  Interpolation  by  Convolution 

As  we  saw  earlier  in  the  context  of  Sine  interpolation  (Eqn.  (22.5)), 
the  reconstruction  of  a  continuous  signal  can  be  described  as  a  linear 
convolution  operation.  In  general,  we  can  express  interpolation  as  a 
convolution  of  the  given  discrete  function  g(u)  with  some  continuous 
interpolation  kernel  w(x)  as 


g{x  o) 


oo 

w*g)  bo)  =  T  w(x°  ~  ' g b)- 

u=  —  oo 


(22.8) 


The  Sine  interpolation  in  Eqn.  (22.6)  is  obviously  only  a  special  case 
with  w(x)  =  Sinc(x).  Similarly,  the  ID  nearest- neighbor  interpola¬ 
tion  (Eqn.  (22.1),  Fig.  22.2(a))  can  be  expressed  as  a  linear  convolu¬ 
tion  with  the  kernel 


w 


nn 


1  for  —0.5  <  x  <  0.5, 
0  otherwise, 


(22.9) 


and  the  linear  interpolation  (see  Eqn.  (22.2),  Fig.  22.2(b))  with  the 
kernel 


^Din  (t) 


\  1  —  X 

for 

X 

1° 

for 

X 

<  1, 

>  1. 


(22.10) 


Both  interpolation  kernels  wnn(x)  and  relin(x)  are  shown  in  Fig.  22.7, 
and  results  for  various  function  types  are  plotted  in  Fig.  22.8. 
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Fig.  22.7 

Convolution  kernels  for  the 
nearest-neighbor  interpo¬ 
lation  wnn(x )  and  the  lin¬ 
ear  interpolation  wUn(x). 


®nn  (4) 
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Fig.  22.8 

Interpolation  examples 
(ID):  nearest-neighbor 
interpolation  (a— c),  lin¬ 
ear  interpolation  (d— f). 


01  0*0 


02  0*0 


03  0*0 


cc 


(d) 


(e) 


(f) 


22.3  Cubic  Interpolation 


The  Sine  function  is  not  a  useful  interpolation  kernel  in  practice, 
because  of  its  infinite  extent  and  the  ringing  artifacts  caused  by  its 
slowly  decaying  oscillations.  Therefore  several  interpolation  methods 
employ  a  truncated  version  of  the  Sine  function  or  an  approximation 
of  it,  thereby  making  the  convolution  kernel  more  compact  and  re¬ 
ducing  the  ringing.  A  frequently  used  approximation  of  a  truncated 
Sine  function  is  the  so-called  cubic  interpolation,  whose  convolution 
kernel  is  defined  as  the  piecewise  cubic  polynomial 


^cub(^b  ^) 


(—a  +  2)  •  |x|3  +  (a  —  3)  • 

x  2  —  8  a 


<  —a 
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\x\  +  5a 
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2  +  l 
+  4a 


for  0  < 
for  1  < 


for 
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x  <  1, 
x  <  2, 

>  2. 

(22.11) 


Parameter  a  can  be  used  to  adjust  the  steepness  of  the  spline  func¬ 
tion  and  thus  the  perceived  “sharpness”  of  the  interpolation  (see  Fig. 
22.9(a)).  For  the  standard  value  a  —  1,  Eqn.  (22.11)  simplifies  to 


^cub(T)  ^ 
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-  2 
+  5 
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+  1 
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for  0  < 
+  4  for  1  < 
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<  1, 
<2, 


(22.12) 


0 


The  comparison  of  the  Sine  function  and  the  cubic  interpolation 
kernel  wcuh(x)  =  wcuh(x,  —1)  in  Fig.  22.9(b)  shows  that  many  high- 
value  coefficients  outside  x  =  ±2  are  truncated  and  thus  relatively 
large  errors  can  be  expected.  However,  because  of  the  compactness 
of  the  cubic  function,  this  type  of  interpolation  can  be  calculated 
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22.3  Cubic 
Interpolation 


Fig.  22.9 

Cubic  interpolation  kernel. 
Function  wcuh(x,a )  with 
control  parameter  a  set  to 
a  =  0.25  (dashed  curve), 
a  =  1  (continuous  curve),  and 
a  =  1.75  (dotted  curve)  (a). 
Cubic  function  u;cub(a;)  and 
Sine  function  compared  (b). 


Fig.  22.10 

Cubic  interpolation  examples. 
Parameter  a  in  Eqn.  (22.11) 
controls  the  amount  of  signal 
overshoot  or  perceived  sharp¬ 
ness:  a  =  0.25  (a— c),  standard 
setting  a  —  1  (d— f),  a  =  1.75 
(g— i).  Notice  in  (d)  the  ripple 
effects  incurred  by  interpolat¬ 
ing  with  the  standard  settings 
in  smooth  signal  regions. 


very  efficiently.  Since  wcuh(x)  =  0  for  \x\  >  2,  only  four  discrete 
values  g(u)  need  to  be  accounted  for  in  the  convolution  operation 
(Eqn.  (22.8))  at  any  continuous  position  xGl,  that  is, 


g(u o-l),  g(u0),  g(u0  +  l),  g(u0+ 2),  with  u0  =  f^oj 


This  reduces  the  ID  cubic  interpolation  to  the  expression 

L^oJ  +2 

g(x o)  =  wcuh(x0-u)  ■  g(u)  .  (22.13) 

IL=  L^O  J — " 1 

Figure  22.10  shows  the  results  of  cubic  interpolation  with  differ¬ 
ent  settings  of  the  control  parameter  a.  Notice  that  the  cubic  recon¬ 
struction  obtained  with  the  popular  standard  setting  (a  =  1)  exhibits 
substantial  overshooting  at  edges  as  well  as  strong  ripple  effects  in 
the  continuous  parts  of  the  signal  (Fig.  22.10(d)).  With  a  =  0.5,  the 
expression  in  Eqn.  (22.11)  corresponds  to  a  Catmull-Rom  spline  [44] 
(see  also  Sec.  22.4),  which  produces  significantly  better  results  than 
the  standard  setup  (with  a  =  1),  particularly  in  smooth  signal  regions 
(see  Fig.  22.12(a-c)). 
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22.4  Spline  Interpolation 


The  cubic  interpolation  kernel  (Eqn.  (22.11))  described  in  the  previ¬ 
ous  section  is  a  piecewise  cubic  polynomial  function,  also  known  as  a 
cubic  spline  in  computer  graphics.  In  its  general  form,  this  function 
takes  not  only  one  but  two  control  parameters  (a,  b )  [164], 2 


wcs(x,a,b)  = 


(22.14) 


1 
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< 


+  (6a  +  12 b  —  18)  •  |t|2  —  26  +  6 
(—6a  —  b)  •  |x|3  +  (30a  +  66)  •  ' 


for  0  <  \x\  <  1, 


0 


-6a  —  6)  •  x\6  +  (30a  +  66)  •  x  2 
+  (—48a  —  126)  •  \x\  +  24a  +  86  for  1  <  \x\  <  2, 

for  \x\  >  2. 


Equation  (22.14)  describes  a  family  of  smooth,  C1  -continuous  func¬ 
tions  (i.e.,  with  continuous  first  derivatives)  with  no  visible  disconti¬ 
nuities  or  sharp  corners.  For  6  =  0,  the  function  wcs(x,a,b)  specifies 
a  one-parameter  family  of  so-called  cardinal  splines  equivalent  to  the 
cubic  interpolation  function  wcuh(x,a)  in  Eqn.  (22.11), 


^+s(+?  +  6)  recub('+a), 


(22.15) 


and  for  the  standard  setting  a  =  1  (Eqn.  (22.12))  in  particular 


^+s(*+ 1?  6)  iecub(T,  1)  ^+ub(+)*  (22.16) 


Figure  22.11  shows  three  additional  examples  of  this  function  type 
that  are  important  in  the  context  of  interpolation:  Catmull-Rom 
splines,  cubic  B-splines ,  and  the  Mitchell- Netravali  function.  All 
three  functions  are  briefly  described  in  the  following  sections.  The 
actual  calculation  of  the  interpolated  signal  follows  exactly  the  same 
scheme  as  used  for  the  cubic  interpolation  described  in  Eqn.  (22.13). 


Fig.  22.11 

Examples  of  cubic  spline 
functions  as  defined  in 
Eqn.  (22.14):  Catmull-Rom 
spline  wcs  (x,  0.5,  0)  (dot¬ 
ted  line),  cubic  B-spline 
wcs(x,  0,  1)  (dashed  line), 
and  Mitch  el  l- Netravali  func¬ 
tion  wcs(x ,  ,  )r)  (solid  line). 


wcs(x,  a ,  b) 


22.4.1  Catmull-Rom  Interpolation 

With  the  control  parameters  set  to  a  =  0.5  and  6  =  0,  the  function 
in  Eqn.  (22.14)  is  a  Catmull-Rom  spline  [44],  as  already  mentioned 
in  Sec.  22.3: 

2  In  [164],  the  parameters  a  and  6  were  originally  named  C  and  B,  re¬ 

spectively,  with  B  =  6  and  C  =  a. 
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^crm(^)  l^cs (x,  0.5,  0) 


(22.17) 
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22.4  Spline 
Interpolation 


Examples  of  signals  interpolated  with  this  kernel  are  shown  in  Fig. 
22.12(a-c).  The  results  are  similar  to  ones  produced  by  cubic  inter¬ 
polation  (with  a  =  1,  see  Fig.  22.10)  with  regard  to  sharpness,  but 
the  Catmull-Rom  reconstruction  is  clearly  superior  in  smooth  signal 
regions  (compare,  e.g.,  Fig.  22.10(d)  vs.  Fig.  22.12(a)). 


22.4.2  Cubic  B-spline  Approximation 

With  parameters  set  to  a  =  0  and  b  =  1,  Eqn.  (22.14)  corresponds  to 
a  cubic  B-spline  function  of  the  form 


^cbs(^)  ^+s(  +  0}  1) 


(22.18) 
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This  function  is  positive  everywhere  and,  when  used  as  an  interpo¬ 
lation  kernel,  causes  a  pure  smoothing  effect  similar  to  a  Gaussian 
smoothing  filter  (see  Fig.  22.12(d-f)).  The  B-spline  function  in  Eqn. 
(22.18)  is  C2-continuous,  that  is,  its  first  and  second  derivatives  are 
continuous.  Notice  that — in  contrast  to  all  previously  described  inter¬ 
polation  methods — the  reconstructed  function  does  not  pass  through 
all  discrete  sample  points.  Thus,  to  be  precise,  the  reconstruction 
with  cubic  B-splines  is  not  called  an  interpolation  but  an  approxima¬ 
tion  of  the  signal. 


22.4.3  Mitchell-Netravali  Approximation 

The  design  of  an  optimal  interpolation  kernel  is  always  a  trade-off  be¬ 
tween  high  bandwidth  (sharpness)  and  good  transient  response  (low 
ringing).  Catmull-Rom  interpolation,  for  example,  emphasizes  high 
sharpness,  whereas  cubic  B-spline  interpolation  blurs  but  creates  no 
ringing.  Based  on  empirical  tests,  Mitchell  and  Netravali  [164]  pro¬ 
posed  a  cubic  interpolation  kernel  as  described  in  Eqn.  (22.14)  with 
parameter  settings  a  =  \  and  b  =  and  the  resulting  interpolation 
function 
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This  function  is  the  weighted  sum  of  a  Catmull-Rom  spline  in  Eqn. 
(22.17)  and  a  cubic  B-spline  in  Eqn.  (22. 18). 3  The  examples  in  Fig. 


3 


See  also  Exercise  22.1. 
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Fig.  22.12 

Cardinal  spline  reconstruc¬ 
tion  examples:  Catmull- 
Rom  interpolation  (a— c), 
cubic  B-spline  approxima¬ 
tion  (d— f),  and  Mitchell- 
Netravali  approximation  (g— i). 


22.1 2 (g-i)  show  that  this  method  is  a  good  compromise,  creating 
little  overshoot,  high  edge  sharpness,  and  good  signal  continuity  in 
smooth  regions.  Since  the  resulting  function  does  not  pass  through 
the  original  sample  points,  the  Mitchell-Netravali  method  is  again  an 
approximation  and  not  an  interpolation. 


22.4.4  Lanczos  Interpolation 

The  Lanczos4  interpolation  belongs  to  the  family  of  “windowed  Sine” 
methods.  In  contrast  to  the  methods  described  in  the  previous  sec¬ 
tions,  these  do  not  use  a  polynomial  (or  other)  approximation  of  the 
Sine  function  but  the  Sine  function  itself  combined  with  a  suitable 
window  function  x );  that  is,  an  interpolation  kernel  of  the  form 

w(x)  =  'ip(x)  •  Sinc(x)  .  (22.20) 

The  particular  window  functions  for  the  Lanczos  interpolation  are 
defined  as 


^L  nW 
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for  \x\  =  0, 
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(22.21) 
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>  n, 


where  n  E  N  denotes  the  order  of  the  filter  [176,237].  Notice  that  the 
window  function  is  again  a  truncated  Sine  function!  For  the  Lanczos 
filters  of  order  n  =  2,3,  which  are  the  most  commonly  used  in  image 
processing,  the  corresponding  window  functions  are 


Cornelius  Lanczos  (1893-1974). 
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22.5  Interpolation  in 
(22.22)  2D 
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(22.23) 


Both  window  functions  are  shown  in  Fig.  22.13(a,b).  The  ID  in¬ 
terpolation  kernels  wL 2  and  wL3  are  obtained  as  the  product  of  the 
Sine  function  (Eqn.  (22.5))  and  the  associated  window  function  (Eqn. 
(22.21)),  that  is, 


W  L2p)  = 


f1 

<  2  • 


for  \x\  =  0, 

sin(W2)-sinfrx)  for  0  <  x  <2 


ir2x2 


0 


for 


x 


x 

>2, 


(22.24) 
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respectively.  In  general,  for  Lanczos  interpolation  of  order  n,  we  get 
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for  x  >  n. 


Figure  22.13(c,d)  shows  the  resulting  interpolation  kernels  together 
with  the  original  Sine  function.  The  function  wL2(x)  is  quite  sim¬ 
ilar  to  the  Catmull-Rom  kernel  wcrm(x)  (Eqn.  (22.17),  Fig.  22.11), 
so  the  results  can  be  expected  to  be  similar  as  well,  as  shown  in 
Fig.  22.14(a-c)  (cf.  Fig.  22.12(a-c)).  Notice,  however,  the  relatively 
poor  reconstruction  in  the  smooth  signal  regions  (Fig.  22.14(a))  and 
the  strong  ringing  introduced  in  the  constant  high-amplitude  regions 
(Fig.  22.14(b)).  The  “3-tap”  kernel  wL3(x)  reduces  these  artifacts 
and  produces  steeper  edges,  at  the  cost  of  increased  overshoot  (Fig. 
22.12(d-f)). 

In  summary,  although  Lanczos  interpolators  have  seen  revived 
interest  and  popularity  in  recent  years,  they  do  not  seem  to  offer 
much  (if  any)  advantage  over  other  established  methods,  particu¬ 
larly  the  cubic,  Catmull-Rom,  or  Mitchell-Netravali  interpolations. 
While  these  are  based  on  efficiently  computable  polynomial  func¬ 
tions,  Lanczos  interpolation  requires  trigonometric  functions  which 
are  relatively  costly  to  compute,  unless  some  form  of  tabulation  is 
used. 


22.5  Interpolation  in  2D 

So  far  we  have  only  looked  at  interpolating  (or  reconstructing)  1 D 
signals  from  discrete  samples.  Images  are  2D  signals  but,  as  we 


549 


22  Pixel  Interpolation 


VTsO) 


Fig.  22.13 

ID  Lanczos  interpolation 
kernels.  Lanczos  window 
functions  yL2  0),  VT3  (bk 
and  the  corresponding  in¬ 
terpolation  kernels  u>L2  (c), 
u>L3  (d).  The  original  Sine 
function  (dotted  curve) 
is  shown  for  comparison. 


Fig.  22.14 

Lanczos  interpolation  exam¬ 
ples:  Lanczos-2  (a— c),  Lanczos- 
3  (d— f).  Note  the  ringing  in 
the  flat  (constant)  regions 
caused  by  Lanczos-2  interpo¬ 
lation  in  the  left  part  of  (b). 

The  Lanczos-3  interpolator 
shows  less  ringing  (e)  but  pro¬ 
duces  steeper  edges  at  the  cost 
of  increased  overshoot  (e,f). 
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shall  see  in  this  section,  the  techniques  for  interpolating  images  are 
very  similar  and  can  be  derived  from  the  ID  approach.  In  particu¬ 
lar,  “ideal”  (low-pass  filter)  interpolation  requires  a  2D  Sine  function 
defined  as 

Sinc(x,j,)  =  Sine)*)  ■  Sinc(y)  =  (22.27) 

7 vx  Try 

which  is  shown  in  Fig.  22.15(a).  Just  as  in  ID,  the  2D  Sine  function 
is  not  a  practical  interpolation  function  for  various  reasons.  In  the 
following,  we  look  at  some  common  interpolation  methods  for  im¬ 
ages,  particularly  the  nearest-neighbor,  bilinear,  bicubic,  and  Lanc¬ 
zos  interpolations,  whose  ID  versions  were  described  in  the  previous 
sections. 

22.5.1  Nearest-Neighbor  Interpolation  in  2D 

The  position  (ux,vy)  of  the  pixel  closest  to  a  given  continuous  point 
(t,  y)  is  found  by  independently  rounding  the  x  and  y  coordinates  to 
discrete  values,  that  is, 


22.5  Interpolation  in 
2D 


Fig.  22.15 

Interpolation  kernels  in 
2D.  Sine  kernel  Sinc(cc,  y) 
(a)  and  nearest-neighbor 
kernel  Wnn(x,  y)  (b)  for 
—  3  <  x,  y  <  3. 


I(x,y)  =  I(ux,vy),  (22.28) 

with  ux  =  round(x)  =  \_x  +  0.5J  und  vy  =  round (y)  =  [y  +  0.5J . 

As  in  the  ID  case,  the  interpolation  in  2D  can  be  described  as 
a  linear  convolution  (linear  filter).  The  2D  kernel  for  the  nearest- 
neighbor  interpolation  is,  analogous  to  Eqn.  (22.9),  defined  as 


ITrin  y  ) 


1  for  —0.5  <x,y<  0.5, 
0  otherwise. 


(22.29) 


This  function  is  shown  in  Fig.  22.15(b).  Nearest-neighbor  interpola¬ 
tion  is  known  for  its  strong  blocking  effects  (Fig.  22.16(b))  and  thus 
is  rarely  used  for  geometric  image  operations.  However,  in  some  sit¬ 
uations,  this  effect  may  be  intended;  for  example,  if  an  image  is  to 
be  enlarged  by  replicating  each  pixel  without  any  smoothing. 


Fig.  22.16 

Image  enlargement  example. 
Original  (a);  8x  enlargement 
using  nearest-neighbor  in¬ 
terpolation  (b)  and  bicubic 
interpolation  (c). 


22.5.2  Bilinear  Interpolation 

The  2D  counterpart  to  the  linear  interpolation  in  ID  (see  Sec.  22.1) 
is  the  so-called  bilinear  interpolation,5  whose  operation  is  illustrated 
in  Fig.  22.17.  For  the  given  interpolation  point  (#,?/),  we  first  find 
the  four  closest  (surrounding)  pixel  vealues, 

A  =  I{ux,vy ),  B  =  I(ux  +  1,  vy),  (22.30) 

C  I{uxl  Vy  T  1),  D  I{ux  T  1,  Vy  T  1), 

5  Not  to  be  confused  with  the  bilinear  mapping  (transformation)  described 
in  Chapter  21,  Sec.  21.1.5. 
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Fig.  22.17 

Bilinear  interpolation.  For  a 
given  position  (x,y),  the  inter¬ 
polated  value  is  computed  from 
the  values  A,  B ,  C ,  D  of  the 
four  closest  pixels  in  two  steps 
(a).  First  the  intermediate 
values  E  and  F  are  computed 
by  linear  interpolation  in  the 
horizontal  direction  between 
A ,  B  and  C,  D,  respectively, 
where  a  =  x  —  ux  is  the  dis¬ 
tance  to  the  nearest  pixel  to 
the  left  of  x.  Subsequently,  the 
intermediate  values  E ,  F  are 
interpolated  in  the  vertical  di¬ 
rection,  where  b  —  y  —  vy  is  the 
distance  to  the  nearest  pixel 
below  y.  An  example  for  the 
resulting  surface  between  four 
adjacent  pixels  is  shown  in  (b). 


where  ux  =  |_xj  and  vx  =  \_y\ .  Then  the  pixel  values  A,  B,C,D  are 
interpolated  in  horizontal  and  subsequently  in  vertical  direction.  The 
intermediate  values  E ,  F  are  calculated  from  the  distance  a  =  (x— ux) 
of  the  specified  interpolation  position  (x,  y )  from  the  discrete  raster 


coordinate  ux  as 


E  =  A  +  (x  -  ux)  •  ( B-A )  =  A  +  a  •  ( B-A ),  (22.31) 

F  =  C  A  (x  -  ux)  •  (D-C)  =  C  +  a  •  (D-C),  (22.32) 

and  the  final  interpolation  value  G  is  computed  from  the  vertical 
distance  b  =  y0  —  vy  as 

I(x,  y)  =  G  =  E  +  (y  -  vy)  •  (F  —  E)  =  E  +  b  •  (F  —  E) 

=  (a  —  l)(b—  1)  A  +  a(l  —  b)  B  +  (1  —  a)bC  +  ab  D  .  (22.33) 


Expressed  as  a  linear  convolution  filter,  the  corresponding  2D 
kernel  Whn(x,y)  is  the  product  of  the  two  ID  kernels  relin(x)  and 
w\in{y)  (Eqn.  (22.10)),  that  is, 


^bilin (*^5  2/)  ^lin(^)  "  ^Din {.!/') 

1  —  x  —  y  A  x  ■  y  for0< 


x 


y I  <  1,  (22.34) 


0 


otherwise. 


In  this  function  (plotted  in  Fig.  22.18),  we  can  recognize  the  bilinear 
term  that  gives  this  method  its  name. 


Fig.  22.18 

2D  interpolation  kernels,  bi¬ 
linear  kernel  Whi\(x,  y)  (a) 
and  bicubic  kernel  Whic(x,  y) 
(b)  for  —3  <  x,y  <  3. 


(a)  vrbil 


(b)  Wbic 
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22.5.3  Bicubic  and  Spline  Interpolation  in  2D 

The  convolution  kernel  for  the  2D  cubic  interpolation  is  also  defined 
as  the  product  of  the  corresponding  ID  kernels  (Eqn.  (22.12)), 


22.5  Interpolation  in 
2D 


Fig.  22.19 

Bicubic  interpolation  in  two 
steps.  The  discrete  image  / 
(pixel  positons  correspond  to 
raster  lines)  is  to  be  interpo¬ 
lated  at  some  continuous  posi¬ 
tion  (x,y).  In  step  1  (left),  a 
ID  interpolation  is  performed 
in  the  horizontal  direction 
with  wcuh(x )  over  four  pixels 
I(ui,Vj)  in  four  lines.  One  in¬ 
termediate  result  Pj  (marked 
□  )  is  computed  for  each  line 
j.  In  step  2  (right),  the  result 

A 

I(x0,  y o)  is  computed  by  a  sin¬ 
gle  cubic  interpolation  in  the 
vertical  direction  over  the  in¬ 
termediate  results  p0,  .  .  .  ,p3. 
In  total,  16  +  4  =  20  interpola¬ 
tion  steps  are  required. 


^/bic('^5  V)  ^cub(+)  *  ^cub(?/)* 


(22.35) 


The  resulting  kernel  is  plotted  in  Fig.  22.18(b).  Due  to  the  decompo¬ 
sition  into  ID  kernels  (Eqn.  (22.13)),  the  computation  of  the  bicubic 
interpolation  is  separable  in  x,  y  and  can  thus  be  expressed  as 


L2/J+2  rL*J+2 

I(x,y)=  E  E  I(u,v)-Wbic(x-u,y-v) 


V  —  L  u  — 

L2/J-1  bJ-1 


3  r  3 

^  ^  ^cub(+  +/)  ^  ^  Ij.'U'i  7  Vj )  ^cub(+  'U'i) 

3=  0  L 


i=0 

s 


Pj 


(22.36) 

(22.37) 


with  ui  =  \_x0\  —1  +  2  and  Vj  =  [y0\  —  1  +  j.  The  quantity  pj  is 
the  intermediate  result  of  the  cubic  interpolation  in  the  x  direction  in 
line  j,  as  illustrated  in  Fig.  22.19.  Equation  (22.37)  describes  a  simple 
and  efficient  procedure  for  computing  the  bicubic  interpolation  using 
only  a  ID  kernel  wcuh(x).  The  interpolation  is  based  on  a  4  x  4 
neighborhood  of  pixels  and  requires  a  total  of  16  +  4  =  20  additions 
and  multiplications. 

This  method,  which  is  summarized  in  Alg.  22.1,  can  be  used  to 
implement  any  x/f/-separable  2D  interpolation  kernel  of  size  4x4, 
such  as  the  2D  Catmull-Rom  interpolation  (Eqn.  (22.17))  with 


^crm (*D  2/)  +rm(^)  *  ^crm (+) 

or  the  Mitchell- Netr avail  interpolation  (Eqn.  (22.19))  with 


(22.38) 


ITmn  (*+?/)  'R’rrin  (.X‘)  •  Winn  (if)  • 


(22.39) 


The  corresponding  2D  kernels  are  shown  in  Fig.  22.20.  For  interpo¬ 
lation  with  separable  kernels  of  larger  size  see  the  general  procedure 
in  Alg.  22.2. 
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"  r  -  - 


"  r  -  - 


Fig.  22.20 

2D  spline  interpolation  ker¬ 
nels:  Catmull-Rom  kernel 
Wcrm(x,  y)  (a),  Mitchell- 
Netravali  kernel  Wmn  (x,y) 
(b),  for  —3  <  x,y  <  3. 


1 


1 


0 


0 


(a)  tEcrm 


(b)  W3 


Alg.  22.1 

Bicubic  interpolation  of  image 
/  at  position  (x,y).  The  ID 
cubic  function  w cub(-)  (Eqn. 
(22.11))  is  used  for  the  sepa¬ 
rate  interpolation  in  the  x  and 
y  directions  based  on  a  neigh¬ 
borhood  of  4  x  4  pixels.  See 
Prog.  22.1  for  a  straightfor¬ 
ward  implementation  in  Java. 


1:  Bicubiclnterpolation  (I,x,y,a) 

Input:  /,  original  image;  x,y  £  R,  continuous  position;  a,  con¬ 
trol  parameter.  Returns  the  interpolated  image  value  at  position 
(x,y). 

2:  q  <r-  0 

3:  for  j  <—  0, . . . ,  3  do  >  iterate  over  4  lines 

4:  v  <-  LyJ  -  1  +  j 

5:  p  0 

6:  for  i  0, . . . ,  3  do  >  iterate  over  4  columns 

7:  u  4 —  |xJ  —  1  H-  i 

8:  p  ^  p  +  I(u,v)  ■  wcuh(x  —  u,a)  >  see  Eq.  22.11 

9:  q  <-  q  +p  ■  wcub{y-v,a) 

10:  return  q 


22.5.4  Lanczos  Interpolation  in  2D 

The  kernels  for  the  2D  Lanczos  interpolation  are  also  x/i/-separable 
into  ID  kernels  (see  Eqns.  (22.24)  and  (22.25),  respectively),  that  is, 

^Ln(^5  2/)  ^Ln(^)  *  ^Ln(^/)  *  (22.40) 

The  resulting  kernels  for  orders  n  —  2  and  n  —  3  are  shown  in  Fig. 
22.21.  Because  of  the  separability  the  2D  Lanczos  interpolation  can 
be  computed,  similar  to  the  bicubic  interpolation,  separately  in  the 
x  and  y  directions.  Like  the  bicubic  kernel,  the  2-tap  Lanczos  kernel 
WL2  (Eqn.  (22.24))  is  zero  outside  the  interval  —  2  <  x,y  <  2,  and 
thus  the  procedure  described  in  Eqn.  (22.37)  and  Alg.  22.1  can  be 
used  with  only  a  small  modification  (replace  wcuh  by  wL2). 


Fig.  22.21 

2D  Lanczos  kernels  for 
n  =  2  and  n  =  3: 
kernels  WL2(a;,  J/)  (a) 
and  W-L3(x,  y)  (b),  with 
—  3  <  x,y  <  3. 


(b)  WL3 
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(a)  WL2 


1: 


2 

3 

4 

5 

6 

7 

8 

9 


Separablelnterpolation(7,  x,  y ,  xt;,  n) 

Input:  /,  original  image;  £  R,  continuous  position;  a  ID 
interpolation  kernel  of  extent  d =n  (n  >  1). 

Returns  the  interpolated  image  value  at  position  (x,  y )  using  the 
composite  interpolation  kernel  W(x,y)  =  w(x)  •  w(y). 

q  A-  0 

for  j  <—  0, . . .  ,2n—l  do  D>  iterate  over  2n  lines 

n  ^  Ivi  ~  n  + 1  +  j  t >  =  Vj 

p  <—  o 


for  z  0, . . . ,  2n—  1  do 

zz  G-  |_tJ  —  n  +  1  +  i 
p  <—  p  +  I(u,  v)  •  w(x  —  u) 


>  iterate  over  2n  columns 

>  =  Ui 


q  q  +  p  •  zz;(z/  —  x) 


10:  return  q 


22.5  Interpolation  in 
2D 


Alg.  22.2 

General  interpolation  with  a 
separable  interpolation  kernel 
W(x,y )  =  wn(x)  •  wn(2/)  of 
extent  ±n  (i.e.,  the  ID  kernel 
wn(x )  is  zero  for  x  <  — n  and 
x  >  n,  with  n  £  N).  Note  that 
procedure  Bicubiclnterpolation  in 
Alg.  22.1  is  a  special  instance 
of  this  algorithm  (with  n  =  2). 


Compared  to  Eqn.  (22.37),  the  larger  Lanczos  kernel  WL3  (Eqn. 
(22.25))  requires  two  additional  pixel  rows  and  columns.  The  calcu¬ 
lation  of  the  interpolated  pixel  value  at  position  (x,  y)  thus  has  the 
form 


\_y\  +3  r  \  x\  +3 


V  = 

LyJ-2 


i(x,y)=  E  E  I(yu’v)'w^ (x~u^y~v) 


u  = 
lx] -2 


5  r  5 

=  E  w^(y  ~  vj )  •  E  I(ui’vj) '  w^(x  -  Ui) 

3=0  L 


i=0 


(22.41) 


(22.42) 


with  ui  =  [x\  -2 -hi  and  Vj  =  \_y\  —  2  +  j.  Thus  the  L3  Lanczos 
interpolation  in  2D  uses  a  support  region  of  6  x  6  =  36  pixels  from 
the  original  image,  20  pixels  more  than  the  bicubic  interpolation. 

In  general,  the  expression  for  a  2D  Lanczos  interpolator  Ln  of 
arbitrary  order  n  >  1  is 


I{x,y) 


[y]  +n  [x]+n 

E  [l(u,v)-WLn(x-u,y-v)] 

V—  U— 

\_y\—  n+1  \_x\-n-\-l 

2n—  1  2n  —  1 

E  [WLn  (V  ~  Vj)  •  E  [7<X’L)  ‘  WLnA  ~  «*)]  , 

j=0  i— 0 


(22.43) 

(22.44) 


with  ui  =  [x\  —  n  +  1  +  i  and  Vj  =  \_y\  —  n  +  1  +  j.  The  size  of  this 
interpolator’s  support  region  is  2 n  x  2 n  pixels.  How  the  expression  in 
Eqn.  (22.44)  could  be  computed  is  shown  in  Alg.  22.2,  which  actually 
describes  a  general  interpolation  procedure  that  can  be  used  with  any 
separable  interpolation  kernel  W (x,  y)  =  wn(x)  -wn(y)  of  extent  ±n. 


22.5.5  Examples  and  Discussion 

Figures  22.22  and  22.23  compare  the  interpolation  methods  described 
in  this  section:  nearest-neighbor,  bilinear,  bicubic  Catmull-Rom,  cu¬ 
bic  B-spline,  Mitchell-Netravali,  and  Lanczos  interpolation.  In  both 
figures,  the  original  images  are  rotated  counter-clockwise  by  15°.  A 
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22  Pixel  Interpolation  Sray  background  is  used  to  visualize  the  edge  overshoot  produced  by 

some  of  the  interpolators. 

Nearest-neighbor  interpolation  (Fig.  22.22(b))  creates  no  new  pixel 
values  but  forms,  as  expected,  coarse  blocks  of  pixels  with  the  same 
intensity. 

The  effect  of  the  bilinear  interpolation  (Fig.  22.22(c))  is  local 
smoothing  over  four  neighboring  pixels.  The  weights  for  these  four 
pixels  are  positive,  and  thus  no  result  can  be  smaller  than  the  small¬ 
est  neighboring  pixel  value  or  greater  than  the  greatest  neighboring 
pixel  value.  In  other  words,  bilinear  interpolation  cannot  create  any 
over-  or  undershoot  at  edges. 

This  is  not  the  case  for  the  bicubic  interpolation  (Fig.  22.22(d)): 
some  of  the  coefficients  in  the  bicubic  interpolation  kernel  are  nega¬ 
tive,  which  makes  pixels  near  edges  clearly  brighter  or  darker,  respec¬ 
tively,  thus  increasing  the  perceived  sharpness.  In  general,  bicubic 
interpolation  produces  clearly  better  results  than  the  bilinear  method 
at  comparable  computing  cost,  and  it  is  thus  widely  accepted  as  the 
standard  technique  and  used  in  most  image  manipulation  programs. 
By  adjusting  the  control  parameter  a  (Eqn.  (22.11)),  the  bicubic  ker¬ 
nel  can  be  easily  tuned  to  fit  the  need  of  particular  applications. 
For  example,  the  Catmull-Rom  method  (Fig.  22.22(e))  can  be  im¬ 
plemented  with  the  bicubic  interpolation  by  setting  a  =  0.5  (Eqns. 
(22.17)  and  (22.38)). 

Results  from  the  2D  Lanczos  interpolation  (Fig.  22.22(h))  using 
the  2-tap  kernel  WL2  cannot  be  much  better  than  from  the  bicubic 
interpolation,  which  can  be  adjusted  to  give  similar  results  without 
causing  any  ringing  in  flat  regions,  as  seen  in  Fig.  22.14.  The  3-tap 
Lanczos  kernel  1TL3  (Fig.  22.22 (i) )  on  the  other  hand  should  produce 
slightly  sharper  edges  at  the  cost  of  increased  overshoot  (see  also 
Exercise  22.3). 

In  summary,  for  high-quality  applications  one  should  consider  the 
Catmull-Rom  (Eqns.  (22.17)  and  (22.38))  or  the  Mitchell- Netravali 
(Eqns.  (22.19)  and  (22.39))  methods,  which  offer  good  reconstruction 
at  the  same  computational  cost  as  the  bicubic  interpolation. 


22.6  Aliasing 

As  we  described  in  the  main  part  of  this  chapter,  the  usual  approach 
for  implementing  geometric  image  transformations  can  be  summa¬ 
rized  by  the  following  three  steps  (Fig.  22.24): 

1.  Each  discrete  image  point  (id,  v')  of  the  target  image  is  projected 
by  the  inverse  geometric  transformation  T-1  to  the  continuous 
coordinate  (x,  y )  in  the  source  image. 

2.  The  continuous  image  function  I(x,y)  is  reconstructed  from  the 
discrete  source  image  I(u,v)  by  interpolation  (using  one  of  the 
methods  described  earlier). 

3.  The  interpolated  function  is  sampled  at  position  (#,?/),  and  the 
sample  value  I(x,y)  is  transferred  to  the  target  pixel  I'(u\v'). 
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22.6  Aliasing 


(d)  Bicubic 


(e)  Catmull-Rom 


Fig.  22.22 

Image  interpolation  methods 
compared  (line  art). 


22.6.1  Sampling  the  Interpolated  Image 

One  problem  not  considered  so  far  concerns  the  process  of  sampling 
the  reconstructed,  continuous  image  function  in  the  aforementioned 
step  3.  The  problem  occurs  when  the  geometric  transformation  T 
causes  parts  of  the  image  to  be  contracted.  In  this  case,  the  dis¬ 
tance  between  adjacent  sample  points  on  the  source  image  is  locally 
increased  by  the  corresponding  inverse  transformation  T-1.  Now, 
widening  the  sampling  distance  reduces  the  spatial  sampling  rate 
and  thus  the  maximum  permissible  frequencies  in  the  reconstructed 
image  function  I(x,y).  Eventually  this  leads  to  a  violation  of  the 
sampling  criterion  and  causes  visible  aliasing  in  the  transformed  im¬ 
age.  The  problem  does  not  occur  when  the  image  is  enlarged  by  the 
geometric  transformation  because  in  this  case  the  sampling  interval 
on  the  source  image  is  shortened  (corresponding  to  a  higher  sampling 
frequency)  and  no  aliasing  can  occur. 

Note  that  this  effect  is  largely  unrelated  to  the  interpolation 
method,  as  demonstrated  by  the  examples  in  Fig.  22.25.  The  ef¬ 
fect  is  most  noticeable  under  nearest-neighbor  interpolation  in  Fig. 
22.25(b),  where  the  thin  lines  are  simply  not  “hit”  by  the  widened 
sampling  raster  and  thus  disappear  in  some  places.  Important  image 
information  is  thereby  lost.  The  bilinear  and  bicubic  interpolation 
methods  in  Fig.  22.25(c,d)  have  wider  interpolation  kernels  but  still 
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22  Pixel  Interpolation 

Fig.  22.23 

Image  interpolation  meth¬ 
ods  compared  (text  image). 
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m  *  i 

(a)  Original 


(g)  Mitchell-Netravali 


(b)  Nearest-neighbor 


(e)  Catmull-Rom 


(h)  Lanczos-2 


(c)  Bilinear 


(f)  Cubic  B-spline 


(i)  Lanczos-3 


Fig.  22.24 

Sampling  errors  in  geometric 
operations.  If  the  geometric 
transformation  T  leads  to  a 
local  contraction  of  the  image 
(which  corresponds  to  a  local 
enlargement  by  T-1),  the  dis¬ 
tance  between  adjacent  sample 
points  in  I  is  increased.  This 
reduces  the  local  sampling 
frequency  and  thus  the  maxi¬ 
mum  signal  frequency  allowed 
in  the  source  image,  which 
eventually  leads  to  aliasing. 


cannot  avoid  the  aliasing  effect.  The  problem  of  course  gets  worse 
with  increasing  reduction  factors. 


22.6.2  Low-Pass  Filtering 

One  solution  to  the  aliasing  problem  is  to  make  sure  that  the  inter¬ 
polated  image  function  is  properly  frequency-limited  before  it  gets 


22.6  Aliasing 


Fig.  22.25 

Aliasing  caused  by  local  image 
contraction.  Aliasing  is  caused 
by  a  violation  of  the  sampling 
criterion  and  is  largely  un¬ 
affected  by  the  interpolation 
method  used:  complete  trans¬ 
formed  image  (a),  detail  using 
nearest-neighbor  interpolation 
(b),  bilinear  interpolation  (c), 
and  bicubic  interpolation  (d). 


I 


I' 


Fig.  22.26 

Low-pass  filtering  to  avoid 
aliasing  in  geometric  opera¬ 
tions.  After  interpolation  (step 
1),  the  reconstructed  image 
function  is  subjected  to  low- 
pass  filtering  (step  2)  before 
being  resampled  (step  3). 


resampled.  This  can  be  accomplished  with  a  suitable  low-pass  filter, 
as  illustrated  in  Fig.  22.26. 

The  cutoff  frequency  of  the  low-pass  filter  is  determined  by  the 
amount  of  local  scale  change,  which  may — depending  upon  the  type 
of  transformation — be  different  in  various  parts  of  the  image.  In  the 
simplest,  case  the  amount  of  scale  change  is  the  same  throughout 
the  image  (e.g.,  under  global  scaling  or  affine  transformations,  where 
the  same  filter  can  be  used  everywhere  in  the  image).  In  general, 
however,  the  low-pass  filter  is  space-variant  or  nonhomogeneous ,  and 
the  local  filter  parameters  are  determined  by  the  transformation  T 
and  the  current  image  position.  If  convolution  filters  are  used  for 
both  interpolation  and  low-pass  filtering,  they  could  be  combined 
into  a  common,  space- variant  reconstruction  filter. 

Unfortunately,  space- variant  filtering  is  computationally  expen¬ 
sive  and  thus  is  often  avoided,  even  in  professional  applications  (e.g., 
Adobe  Photoshop).  The  technique  is  nevertheless  used  in  certain  ap- 
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Table  22.1 

Admissible  values  for 
InterpolationMethod  and  as¬ 
sociated  interpolator  types  re¬ 
turned  by  the  static  create (im) 
method  of  Pixellnterpolator. 


plications,  such  as  high-quality  texture  mapping  in  computer  graph¬ 
ics  [75,105,256].  Integral  images,  as  described  in  Chapter  3,  Sec.  3.8, 
can  be  used  to  implement  efficient  space-variant  smoothing  filters. 


22.7  Java  Implementation 

Implementations  of  most  interpolation  methods  described  in  this 
chapter  are  openly  available  as  part  of  the  imagingbook  library.6 
The  following  interpolators  are  available  as  subclasses  of  the  abstract 
class  Pixellnterpolator: 

Bicubic Interpolator, 

Bilinear Interpolator, 

Lanczos Interpolator, 

Near estNeighbor Interpolator, 

Spline Interpolator. 

For  illustration,  the  complete  implementation  of  the  class  Bicubic- 
Interpolator  is  shown  in  Prog.  22.1. 


Pixellnterpolator  (class) 

This  class  provides  the  functionality  for  interpolating  images  with 
scalar  pixel  values.  It  defines  the  following  methods: 

static  Pixellnterpolator  create  (InterpolationMethod 
im) 

Factory  method  which  creates  and  returns  a  new  interpolator. 
Admissible  values  for  the  parameter  im  and  associated  inter¬ 
polator  types  (subclasses  of  Scalarlnterpolator)  are  listed 
in  Table  22.1. 

float  getlnterpolatedValue  (ImageAccessor . Scalar  ia, 
double  x,  double  y) 

Returns  the  interpolated  pixel  value  at  the  continuous  posi¬ 
tion  x,  y  of  the  scalar- valued  image  (referenced  by  the  image 
accessor  ia). 


InterpolationMethod  im 

Interpolator  Type 

NearestNeighbor 

NearestNeighborlnterpolator () 

Bilinear 

Bilinearlnterpolator ( ) 

Bicubic 

Bicubiclnterpolator (1.00) 

BicubicSmooth 

Bicubiclnterpolator (0 . 25) 

Bicubic Sharp 

Bicubiclnterpolator (1.75) 

CatmullRom 

Splinelnterpolator (0 . 5 ,  0.0) 

CubicBSpline 

Splinelnterpolator (0 . 0 ,  1.0) 

MitchellNetravali 

Splinelnterpolator (1 . 0/3,  1.0/3) 

Lanzcos2 

Lanczoslnterpolator (2) 

Lanzcos3 

Lanczoslnterpolator (3) 

Lanzcos4 

Lanczoslnterpolator (4) 

Package  imagingbook . lib . interpolation. 
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6 


1 

9 

package  imagingbook . lib . interpolation ; 

3 

import  imagingbook . lib . image . ImageAccessor ; 

4 

c; 

import  java.awt .geom. Point 2D; 

o 

6 

public  class  Bicubiclnterpolator 

7 

8 

9 

extends  Pixelinterpolator  { 

private  final  double  a;  // sharpness  value 

10 

11 

public  Bicubiclnterpolator ()  { 

12 

this (0.5); 

13 

} 

14 

public  Bicubiclnterpolator (double  a)  { 

15 

this. a  =  a; 

16 

} 

17 

18 

public  float  getlnterpolatedValue ( 

19 

ImageAccessor . Scalar  ia,  double  x, 

double  y)  { 

20 

final  int  uO  =  (int)  Math . floor (x) ; 

21 

final  int  vO  =  (int)  Math . floor (y) ; 

22 

double  q  =  0; 

23 

for  (int  j  =0;  j  <=  3;  j++)  { 

24 

int  v  =  vO  -  1  +  j ; 

25 

double  p  =  0; 

26 

for  (int  i  =  0;  i  <=  3;  i++)  { 

27 

int  u  =  uO  -  1  +  i ; 

28 

float  pixval  =  ia.getVal(u,  v) ; 

29 

p  =  p  +  pixval  *  w_cub(x  -  u,  a); 

30 

} 

31 

q  =  q  +  p  *  w_cub(y  -  v,  a); 

32 

} 

33 

return  (float)  q; 

34 

} 

35 

36 

private  final  double  w_cub (double  x,  double 

a)  { 

37 

if  (x  <  0) 

38 

x  =  -x; 

39 

double  z  =  0; 

40 

if  (x  <  1) 

41 

z  =  (-a  +2)  *x*x*x+  (a  -3)  *  x 

*  x  +  1; 

42 

else  if  (x  <  2) 

43 

z  =  -a*x*x*x  +  5*a*x*x 

44 

-8*a*x  +  4*a; 

45 

return  z; 

46 

} 

47 

} 

22.7  Java 
Implementation 

Prog.  22.1 

Java  implementation  of 
bicubic  interpolation  (class 
Bicubiclnterpolator),  as  de¬ 
fined  in  Alg.  22.1.  The  class 
provides  two  constructors: 
a  default  constructor  (line 
11)  with  sharpness  value 
a  =  0.5  and  a  general  con¬ 
structor  for  arbitrary  a  (line 
14).  The  actual  pixel  interpo¬ 
lation  is  performed  by  method 
getlnterpolatedValue ()  in  line 
18,  which  implements  Alg. 

22.1.  w_cub()  in  line  36  is  the 
ID  cubic  interpolation  function 
(see  Eqn.  (22.11)). 


The  class  Pixelinterpolator  is  primarily  used  by  the  methods  in 
class  ImageAccessor.7  See  Prog.  22.2  for  a  basic  usage  example. 


7  The  ImageAccessor  class  (in  package  imagingbook.  lib .  image)  pro¬ 
vides  unified  access  to  all  types  of  images  available  in  ImageJ  and  also 
supports  pixel  interpolation. 
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Prog.  22.2 

Image  interpolation  example 
using  class  ImageAccessor.  This 
IrnageJ  plugin  translates  the 
input  image  by  some  (non¬ 
integer)  distance  dx,  dy.  It 
uses  target-to-source  mapping 
and  pixel  interpolation  of  type 
Bicubic Sharp  (see  Table  22.1). 
The  required  ImageAccessor 
(interpolator)  object  for  the 
source  image  is  created  in  line 
31,  another  for  the  target  im¬ 
age  in  line  34.  This  is  followed 
by  an  iteration  over  all  pix¬ 
els  of  the  target  image.  The 
source  image  is  interpolated 
(line  41)  at  the  calculated 
positions  (x,  y)  and  the  result¬ 
ing  float  []  value  is  inserted 
into  the  target  image  with 
setPixQ  in  line  42.  Note  that 
this  plugin  is  generic,  that  is, 
it  works  for  all  image  types. 


1  import  i j . ImagePlus ; 

2  import  ij . plugin . filter . PluglnFilter ; 

3  import  ij . process . ImageProcessor ; 

4  import  imagingbook . lib . image . ImageAccessor ; 

5  import  imagingbook . lib . image . OutOf BoundsStrategy ; 

6  import  static  imagingbook . lib . image . OutOf BoundsStrategy . * ; 

7  import  imagingbook . lib . interpolation . InterpolationMethod ; 

8  import  static  imagingbook . lib . interpolation . 

InterpolationMethod . * ; 

9 


10  public  class  Interpolator_Demo  implements  PluglnFilter  { 


11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 

47 

48 

49  } 


static  double  dx  =  0.5;  //translation 
static  double  dy  =  -3.5; 

static  OutOf BoundsStrategy  0BS  =  NearestBorder ; 
static  InterpolationMethod  IPM  =  B i cubic Sharp ; 

public  int  setup (String  arg,  ImagePlus  imp)  { 
return  D0ES_ALL  +  N0_CHANGES ; 

} 

public  void  run (ImageProcessor  source)  { 
final  int  w  =  source .  getWidthO  ; 
final  int  h  =  source . getHeight () ; 

//  create  the  target  image  (same  type  as  source): 

ImageProcessor  target  =  source . createProcessor (w,  h) ; 

//  create  an  ImageAccessor  for  the  source  image: 

ImageAccessor  sA  = 

ImageAccessor . create (source ,  0BS,  IPM); 

//  create  an  ImageAccessor  for  the  target  image: 

ImageAccessor  tA  =  ImageAccessor . create (target) ; 

//  iterate  over  all  pixels  of  the  target  image: 
for  (int  u  =  0;  u  <  w;  u++)  { 
for  (int  v  =  0;  v  <  h;  v++)  { 

double  x  =  u  +  dx;  // continuous  source  position  (x,y) 

double  y  =  v  +  dy; 

f loat []  val  =  sA.getPix(x,  y) ; 

tA .  setPix  (u,  v,  val);  //  update  the  target  pixel 

} 

} 

//  display  the  target  image: 

(new  ImagePlus ( "Target " ,  target )). show () ; 
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22.8  Exercises 


Exercise  22.1.  The  ID  interpolation  function  by  Mitchell  and  Na- 
travali  remn(x)  is  defined  as  a  general  spline  function  recs(x,  a,  b )  (Eqn. 
(22.19)).  Show  that  this  function  can  be  expressed  as  the  weighted 
sum  of  a  Catmull-Rom  function  wcrm(x )  (Eqn.  (22.17))  and  a  cubic 
B-spline  wchs(x)  (Eqn.  (22.18))  in  the  form 


w 


mn 


(x)  1CCS  (x,  373) 

=  3  •  2  •  iecs(x,  0.5,  0)  +  iecs(x,  0, 1) 

3  ^  ^crm(^)  T  ^cbs(^) 


(22.45) 


Exercise  22.2.  Implement  an  “ideal”  (low-pass)  pixel  interpolator 
based  on  the  Sine  function  (see  Eqn.  (22.5)).  Assume  that  the  image 
function  is  periodic  along  both  coordinate  axes.  Determine  (by  trun¬ 
cating  the  Sine  function  at  T7V)  the  minimum  number  of  samples  to 
include  and  if  the  result  improves  by  including  additional  samples. 
Use  the  class  Bicubiclnterpolator  (Prog.  22.1)  as  a  template  for 
your  implementation. 


Exercise  22.3.  Implement  the  2D  Lanczos  interpolation  with  a 
kernel,  as  defined  in  Eqn.  (22.42),  as  a  Java  class  analogous  to  the 
class  Bicubiclnterpolator  (Prog.  22.1).  Compare  the  results  to  the 
bicubic  interpolation. 


Exercise  22.4.  The  ID  Lanczos  interpolation  kernel  of  order  n  =  4 
is  (analogous  to  Eqn.  (22.25))  defined  as 


WL4  — 


sin(7r  j )  -sin(7rcc) 
ir2x2 


for  0  <  \x\  <  4, 
for  x  >  4. 


(22.46) 


Extend  the  2D  kernel  in  Eqn.  (22.42)  to  reL4  and  implement  this  in¬ 
terpolator  as  a  Java  class  analogous  to  Bicubiclnterpolator  (Prog. 
22.1).  How  many  image  pixels  does  the  calculation  include  at  each 
position?  See  if  there  is  any  noticeable  improvement  over  the  bicubic 
and  the  Lanczos-3  interpolation  (Exercise  22.3). 


563 


23 


Image  Matching  and  Registration 


When  we  compare  two  images,  we  are  faced  with  the  following  basic 
question:  when  are  two  images  the  same  or  similar,  and  how  can 
this  similarity  be  measured?  Of  course  one  could  trivially  define  two 
images  /1?  /2  as  being  identical  when  all  pixel  values  are  the  same 
(i.e.,  the  difference  I1  —  I2  is  zero).  Although  this  kind  of  definition 
may  be  useful  in  specific  applications,  such  as  for  detecting  changes 
in  successive  images  under  constant  lighting  and  camera  conditions, 
simple  pixel  differencing  is  usually  too  inflexible  to  be  of  much  prac¬ 
tical  use.  Noise,  quantization  errors,  small  changes  in  lighting,  and 
minute  shifts  or  rotations  can  all  create  large  numerical  pixel  differ¬ 
ences  for  pairs  of  images  that  would  still  be  perceived  as  perfectly 
identical  by  a  human  viewer.  Obviously,  human  perception  incor¬ 
porates  a  much  wider  concept  of  similarity  and  uses  cues  such  as 
structure  and  content  to  recognize  similarity  between  images,  even 
when  a  direct  comparison  between  individual  pixels  would  not  indi¬ 
cate  any  match.  The  problem  of  comparing  images  at  a  structural  or 
semantic  level  is  a  difficult  problem  and  an  interesting  research  field, 
for  example,  in  the  context  of  inrage-based  searches  on  the  Internet 
or  database  retrieval. 

This  chapter  deals  with  the  much  simpler  problem  of  comparing 
images  at  the  pixel  level;  in  particular,  localizing  a  given  subimage — 
often  called  a  “template” — within  some  larger  image.  This  task  is 
frequently  required,  for  example,  to  find  matching  patches  in  stereo 
images,  to  localize  a  particular  pattern  in  a  scene,  or  to  track  a  cer¬ 
tain  pattern  through  an  image  sequence.  The  principal  idea  behind 
“template  matching”  is  simple:  move  the  given  pattern  (template) 
over  the  search  image,  measure  the  difference  against  the  correspond¬ 
ing  subimage  at  each  position,  and  record  those  positions  where  the 
highest  similarity  is  obtained.  But  this  is  not  as  simple  as  it  may 
initially  sound.  After  all,  what  is  a  suitable  distance  measure,  what 
total  difference  is  acceptable  for  a  match,  and  what  happens  when 
brightness  or  contrast  changes? 

We  already  touched  on  this  problem  of  invariance  under  geomet¬ 
ric  transformations  when  we  discussed  the  shape  properties  of  seg- 
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23  Image  Matching  and 
Registration 

Fig.  23.1 

Geometry  of  template  match¬ 
ing.  The  reference  image  R 
is  shifted  across  the  search 
image  I  by  an  offset  (r,  s ) 
using  the  origins  of  the  two 
images  as  the  reference  points. 

The  dimensions  of  the  search 
image  (Mj  X  Nj)  and  the 
reference  image  {MR  X  NR) 
determine  the  maximal  search 
region  for  this  comparison. 
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Mj 


(o,  or 


merited  regions  in  Chapter  10,  Sec.  10.4.2.  However,  geometric  in¬ 
variance  is  not  our  main  concern  in  the  remaining  part  of  this  chap¬ 
ter,  where  we  describe  only  the  most  basic  template-matching  tech¬ 
niques:  correlation-based  methods  for  intensity  images  and  “chamfer¬ 
matching”  for  binary  images. 


23.1  Template  Match  ng  in  Intensity  Images 

First  we  look  at  the  problem  of  localizing  a  given  reference  image 
(template)  R  within  a  larger  intensity  (grayscale)  image  /,  which  we 
call  the  search  image.  The  task  is  to  find  those  positions  where  the 
contents  of  the  reference  image  R  and  the  corresponding  subimage 
of  I  are  either  the  same  or  most  similar.  If  we  denote  by 

Rrs(u,v)  =  R(u  —  r,v  —  s)  (23.1) 

the  reference  image  R  shifted  by  the  distance  (r,  s)  in  the  horizon¬ 
tal  and  vertical  directions,  respectively,  then  the  matching  problem 
(illustrated  in  Fig.  23.1)  can  be  summarized  as  follows: 

•  Given  are  the  search  image  I  and  the  reference  image  R.  Find 
the  offset  (r,  s)  G  Z2  such  that  the  similarity  between  the  shifted 
reference  image  Rr  s  and  the  corresponding  subimage  of  I  is  a 
maximum. 

To  successfully  solve  this  task,  several  issues  need  to  be  addressed, 
such  as  determining  a  minimum  similarity  value  for  accepting  a  match 
and  developing  a  good  search  strategy  for  finding  the  optimal  dis¬ 
placement.  First,  and  most  important,  a  suitable  measure  of  simi¬ 
larity  between  subimages  must  be  found  that  is  reasonably  tolerant 
against  intensity  and  contrast  variations. 

23.1.1  Distance  between  Image  Patterns 

To  quantify  the  amount  of  agreement,  we  compute  a  “distance”  d(r,  s) 
between  the  shifted  reference  image  R  and  the  corresponding  subim¬ 
age  of  I  for  each  offset  position  (r,  s)  (Fig.  23.2).  Several  distance 


23.1  Template 
Matching  in  Intensity 
Images 


Fig.  23.2 

Measuring  the  distance  be¬ 
tween  2D  image  functions.  The 
reference  image  R  is  positioned 
at  offset  (r,  s)  on  top  of  the 
search  image  I . 


measures  have  been  proposed  for  2D  intensity  images,  including  the 
following  three  basic  definitions:1 

Sum  of  absolute  differences: 


dhcs)  =  y]| I(r  +  i,s  +  j)  -  R(i,j)\.  (23.2) 

Maximum  difference: 

dM(r,s)  =  max  \I(r  +  i,  s  +  j)  —  R(i,  j)\ .  (23.3) 

(ij)eR 


Sum  of  squared  differences: 


ds(r,s)  =  [  ^2(l(r  +  i,s  +  j)  -  R(i,j ))2 
(i,j)eR 


(23.4) 


Note  that  the  expression  in  Eqn.  (23.4)  is  nothing  else  but  the  Eu¬ 
clidean  distance  between  two  TV-dimensional  vectors  of  pixels  values. 
Similarly,  the  sum  of  differences  in  Eqn.  (23.2)  is  equivalent  to  the 
L1  distance,  and  the  maximum  difference  in  Eqn.  (23.3)  equals  the 
distance  norm.2 


Distance  and  correlation 

Because  of  its  formal  properties,  the  TV-dimensional  distance  d^ 
(Eqn.  (23.4))  is  of  special  importance  and  well-known  in  statistics 
and  optimization.  To  find  the  best-matching  position  between  the 
reference  image  R  and  the  search  image  T,  it  is  sufficient  to  minimize 
the  square  of  d^  (which  is  always  positive),  which  can  be  expanded  to 

d| (r,s)  =  ^2(l(r+i,s+j)  -  R(i,j))2  (23.5) 

(i,j)eR 

=  E/2(r+*,s+-b  +E-r2(*d  ~ 2  •  ETr+*’s"0)  ■  Rtta)- 

(i,j)eR  (i,j)£R  (i,j)€R 

V  A{^7)  "  V  B  '  V  c^s)  / 

1  We  use  the  short  notation  (z,j)  G  R  to  specify  the  set  of  all  possible 
template  coordinates,  that  is,  {(T,  j)  |  0  <  i  <  MR,  0  <  j  <  NR}. 

2  See  also  Sec.  B.1.2  in  the  Appendix. 
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23  Image  Matching  and  Notice  that  the  term  B  in  Eqn.  (23.5)  is  the  sum  of  the  squared  pixel 
Registration  values  in  the  reference  image  R ,  a  constant  value  (independent  of 

r,  s)  that  can  thus  be  ignored.  The  term  A(r,  s )  is  the  sum  of  the 
squared  values  within  the  subimage  of  I  at  the  current  offset  (r,  s). 
C(r,  s)  is  the  so-called  linear  cross  correlation  (©)  between  I  and  R, 
which  is  defined  in  the  general  case  as 

oo  oo 

(/  ©  R)(r,  s)  =  Y  Y  Ur+hs+j)  •  (23.6) 

i—  —  oo  j—  —  oo 

which — since  R  and  I  are  assumed  to  have  zero  values  outside  their 
boundaries — is,  furthermore,  equivalent  to 


Mr-1  Nr—1 

Y  XTr+i’S+i)  •  R{hj)  =  Xd(r+M+j)-  (23.7) 

i= o  j= o  (iJ)eR 

and  thus  the  same  as  C(r,  s)  in  Eqn.  (23.5).  As  we  can  see  in  Eqn. 
(23.6),  correlation  is  in  principle  the  same  operation  as  linear  convo¬ 
lution  (see  Ch.  5,  Eqn.  (5.16)),  with  the  only  difference  being  that 
the  convolution  kernel  (R(i,j)  in  this  case)  is  implicitly  mirrored. 

If  we  assume  for  a  minute  that  A(r,  s) — the  “signal  energy” —  in 
Eqn.  (23.5)  is  constant  throughout  the  image  /,  then  A(r,  s)  can  also 
be  ignored  and  the  position  of  maximum  cross  correlation  C(r,  s) 
coincides  with  the  best  match  between  R  and  /.  In  this  case,  the 
minimum  of  d2E(r,s)  (Eqn.  (23.5))  can  be  found  by  computing  the 
maximum  value  of  the  correlation  I  ®  R  only.  This  could  be  inter¬ 
esting  for  practical  reasons  if  we  consider  that  the  linear  convolution 
(and  thus  the  correlation)  with  large  kernels  can  be  computed  very 
efficiently  in  the  frequency  domain  (see  also  Ch.  19,  Sec.  19.5). 


Normalized  cross  correlation 

Unfortunately,  the  assumption  made  earlier  that  M(r,  s)  is  constant 
does  not  hold  for  most  images,  and  thus  the  result  of  the  cross  cor¬ 
relation  strongly  varies  with  intensity  changes  in  the  image  /.  The 
normalized  cross  correlation  CN(r,s)  compensates  for  this  depen¬ 
dency  by  taking  into  account  the  energy  in  the  reference  image  and 
the  current  subimage: 


C(r ,  s)  C(r ,  s) 

y/A(r,s)  •  B  \jA(r,  s)  •  \[B 

YHr+hs+j)  ■  R(i,j) 

(iJ)eR 


Y  I2(r+i,s+j) 
(iJ)eR 


1/2-  Er2(W)] 

(i,j)eR 


(23.8) 


(23.9) 


If  the  values  in  the  search  and  reference  images  are  all  positive  (which 
is  usually  the  case),  then  the  result  of  CN(r ,  s)  is  always  in  the  range 
[0,1],  independent  of  the  remaining  contents  in  I  and  R.  In  this 
case,  the  result  CN(r ,  s)  =  1  indicates  a  maximum  match  between  R 
and  the  current  subimage  of  I  at  the  offset  (r,  s),  while  CN(r,  s)  = 
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0  signals  no  agreement.  Thus  the  normalized  correlation  has  the 
additional  advantage  of  delivering  a  standardized  match  value  that 
can  be  used  directly  (using  a  suitable  threshold  between  0  and  1)  to 
decide  about  the  acceptance  or  rejection  of  a  match  position. 

In  contrast  to  the  “global”  cross  correlation  in  Eqn.  (23.6),  the 
expression  in  Eqn.  (23.8)  is  a  “local”  distance  measure.  However,  it, 
too,  has  the  problem  of  measuring  the  absolute  distance  between  the 
template  and  the  subimage.  If,  for  example,  the  overall  intensity  of 
the  image  I  is  altered,  then  even  the  result  of  the  normalized  cross 
correlation  CN(r,s)  may  also  change  dramatically. 

Correlation  coefficient 

One  solution  to  this  problem  is  to  compare  not  the  original  function 
values  but  the  differences  with  respect  to  the  average  value  of  R  and 
the  average  of  the  current  subimage  of  /.  This  modification  turns 
Eqn.  (23.8)  into 


CL (r,  s) 


TT(r +*>  s+i) -  Vs)  ■  (R(^j) _  r) 

Rj)£R 


[J2{l(r+i,s+j)  -  V,s)2] 1/2  •  -  nY) 


2 -|  1/2  ’ 


(i,j)eR 


(hj)eR 

V - V - ' 

Sr  =  K-a%  (23.10) 


with  the  average  values  Ir  s  and  R  defined  as 

V,s  =  ■  ^2l(r+i,s+j)  and  R  =  R  •  ^R(i,  j),  (23.11) 


(i,j)eR 


K 


(i,j)eR 


respectively,  (K  =  \R\  being  the  size  of  the  reference  image  R).  In 
statistics,  the  expression  in  Eqn.  (23.10)  is  known  as  the  correlation 
coefficient.  However,  different  from  the  usual  application  as  a  global 
measure  in  statistics,  CL(r,s)  describes  a  local ,  piecewise  correlation 
between  the  template  R  and  the  current  subimage  (at  offset  r,  s)  of  I. 
The  resulting  values  of  CL(r,  s )  are  in  the  range  [—1,1]  regardless  of 
the  contents  in  R  and  I.  Again  a  value  of  1  indicates  maximum  agree¬ 
ment  between  the  compared  image  patterns,  while  —1  corresponds 
to  a  maximum  mismatch.  The  term 


S2R  =  K-a2R  =  ^2(R(i,j)-R)2  (23.12) 

(i,j)eR 

in  the  denominator  of  Eqn.  (23.10)  is  K  times  the  variance  (crjf)  of 
the  values  in  the  template  R ,  which  is  constant  and  thus  needs  to  be 
computed  only  once.  Due  to  the  fact  that  a2R  =  R2{f  j)  ~ 
the  expression  in  Eqn.  (23.12)  can  be  reformulated  as 


Si  =  ^R\i,j)  -  K  R 2 

(i,j)eR 

=  Y^R2(i,j)  - 

(iJ)eR  (i,j)€R 


(23.13) 

(23.14) 
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23  Image  Matching  and  By  inserting  the  results  from  Eqns.  (23.11)  and  (23.14)  we  can  rewrite 
Registration  Eqn.  (23.10)  as 


CL(r,s) 


^2(l(r+i,s+j)  ■  R(i,j))  -  K-Irs-R 
(iJ)eR 


[  E/2(r+i’s"G) 


K  •  I 


1  1/2 


r,s 


5  ^  J 


5 


R 


(23.15) 


and  thereby  obtain  an  efficient  way  to  compute  the  local  correlation 
coefficient.  Since  R  and  SR  =  ( S must  be  calculated  only  once 
and  the  local  average  of  the  current  subimage  Irs  is  not  immediately 
required  for  summing  up  the  differences,  the  whole  expression  in  Eqn. 
(23.15)  can  be  computed  in  one  common  iteration,  as  shown  in  Alg. 
23.1. 

Note  that  in  the  calculation  of  CL(r,s)  in  Eqn.  (23.15),  the  de¬ 
nominator  becomes  zero  if  any  of  the  two  factors  is  zero.  This  may 
happen,  for  example,  if  the  search  image  I  is  locally  “flat”  and  thus 
has  zero  variance  or  if  the  reference  image  R  is  constant.  The  quan¬ 
tity  1  is  added  to  the  denominator  in  Alg.  23.1  (line  23)  to  avoid 
divisions  by  zero  in  such  cases,  which  otherwise  has  no  significant 
effect  on  the  result. 

A  direct  Java  implementation  of  this  procedure  is  shown  in  Progs. 
23.1  and  23.2  in  Sec.  23.1.3  (class  CorrCoef fMatcher). 


Examples  and  discussion 

Figure  23.3  compares  the  performance  of  the  described  distance  func¬ 
tions  in  a  typical  example.  The  original  image  (Fig.  23.3(a))  shows  a 
repetitive  flower  pattern  produced  under  uneven  lighting  and  differ¬ 
ences  in  local  brightness.  One  instance  of  the  repetitive  pattern  was 
extracted  as  the  reference  image  (Fig.  23.3(b)). 

•  The  sum  of  absolute  differences  (Eqn.  (23.2))  in  Fig.  23.3(c)  shows 
a  distinct  peak  value  at  the  original  template  position,  as  does 
the  Euclidean  distance  (Eqn.  (23.4))  in  Fig.  23.3(e).  Both  mea¬ 
sures  work  satisfactorily  in  this  regard  but  are  strongly  affected 
by  global  intensity  changes,  as  demonstrated  in  Figs.  23.4  and 
23.5. 

•  The  maximum  difference  (Eqn.  (23.3))  in  Fig.  23.3(d)  proves  com¬ 
pletely  useless  as  a  distance  measure  since  it  responds  more  strongly 
to  the  lighting  changes  than  to  pattern  similarity.  As  expected, 
the  behavior  of  the  global  cross  correlation  in  Fig.  23.3(f)  is  also 
unsatisfactory.  Although  the  result  exhibits  a  local  maximum  at 
the  true  template  position  (hardly  visible  in  the  printed  image), 
it  is  completely  dominated  by  the  high-intensity  responses  in  the 
brighter  parts  of  the  image. 

•  The  result  from  the  normalized  cross  correlation  in  Fig.  23.3(g) 
appears  naturally  very  similar  to  the  Euclidean  distance  (Fig. 
23.3(e)),  because  in  principle  it  is  the  same  measure.  As  ex¬ 
pected,  the  correlation  coefficient  (Eqn.  (23.10))  in  Fig.  23.3(h) 
yields  the  best  results.  Distinct  peaks  of  similar  intensity  are  pro¬ 
duced  for  all  six  instances  of  the  template  pattern,  and  the  result 
is  unaffected  by  changing  lighting  conditions.  In  this  case,  the 
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CorrelationCoefficient  (/,  R) 

Input:  I(u,v ),  search  image;  R(i,j ),  reference  image. 

Returns  a  map  C(r,  s)  containing  the  values  of  the  correlation 
coefficient  between  I  and  R  positioned  at  (r,  s). 

Step  1-initialize: 

<-  Size(J) 

(Mr,  Nr)  a-  Size(R) 

K  ^  Mr-  Nr 
Ur  0,  T+2  0 

for  z  0, . . . ,  (Mr  —  1)  do 
for  j  i —  0, . . . ,  (Nr  —  1)  do 

Zr  +R(iJ) 

ER2  ER2  +  E2(i,j) 

R^Ur/K  >Eq.  23.11 

Sr  <r-  (U R2  -  K-R2)1/2  t>  Eq.  23.14 

Step  2 — compute  the  correlation  map: 

Create  map  C :  (M:  —  MR  + 1)  x  (Nj  —  Nr  +  I)  ha  M 

for  r  <—  0, . . . ,  Mj  —  Mr  do  >  place  R  at  position  (r,  s) 

for  s  <—  0, . . . ,  Nj  —  Nr  do 

Compute  the  correlation  coefficient  for  position  (r,  s): 

Uj  A-  0,  UI2  a-  0,  UIR  A-  0 
for  i  i —  0, . . . ,  Mr  —  1  do 
for  j  A-  0, . . . ,  Nr  —  1  do 
a/  A-  I(r  +  z,  s  +  j) 

j) 

U  J  A-  Uj  +  CLj 

U 12  Uj2  +  a} 

E ir  UIR  +  aj  •  aR 


23: 

24: 


C(r,  s)  A- 


Nir  —  Ur  R 

1 +^UI2-U2/K-Sr 


return  C 


>  C(r,  s)  A 


23.1  Template 
Matching  in  Intensity 
Images 

Alg.  23.1 

Calculation  of  the  correla¬ 
tion  coefficient.  Given  is  the 
search  image  I  and  the  refer¬ 
ence  image  (template)  R.  In 
Step  1,  the  template’s  average 
R  and  variance  term  SR  are 
computed  once.  In  Step  2,  the 
match  function  is  computed  for 
every  template  position  (r,  s) 
as  prescribed  by  Eqn.  (23.15). 
The  result  is  a  map  of  corre¬ 
lation  values  C(r,  s )  £  [  —  1,1] 
that  is  returned.  In  line  23  (cf. 
Eqn.  (23.15))  the  quantity  1  is 
added  to  the  denominator  to 
avoid  division  by  zero  in  the 
case  of  zero  variance. 


values  range  from  —1.0  (black)  to  +1.0  (white),  and  zero  values 
are  shown  as  gray. 


Figure  23.4  compares  the  results  of  the  Euclidean  distance  against 
the  correlation  coefficient  under  globally  changing  intensity.  For  this 
purpose,  the  intensity  of  the  reference  image  R  is  raised  by  50  units 
such  that  the  template  is  different  from  any  subpattern  in  the  original 
image.  As  can  be  seen  clearly,  the  initially  distinct  peaks  disappear 
under  the  Euclidean  distance  (Fig.  23.4(c)),  while  the  correlation  co¬ 
efficient  (Fig.  23.4(d))  naturally  remains  unaffected  by  this  change. 

In  summary,  the  correlation  coefficient  can  be  recommended  as 
a  reliable  measure  for  template  matching  in  intensity  images  under 
realistic  lighting  conditions.  This  method  proves  relatively  robust 
against  global  changes  of  brightness  or  contrast  and  tolerates  small 
deviations  from  the  reference  pattern.  Since  the  resulting  values  are 
in  the  fixed  range  of  [—1, 1],  a  simple  threshold  operation  can  be  used 
to  localize  the  best  match  points  (Fig.  23.6). 
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23  Image  Matching  and 
Registration 

Fig.  23.3 

Comparison  of  various  distance 
functions.  From  the  original 
image  I  (a),  the  marked  sec¬ 
tion  is  used  as  the  reference 
image  i?,  shown  enlarged  in 
(b).  In  the  resulting  differ¬ 
ence  images  (c  h),  brightness 
corresponds  to  the  amount 
of  agreement  (white  equals 
minimum  distance).  The  po¬ 
sition  of  the  true  reference 
point  is  marked  by  a  red  circle. 
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(a)  Original  image  I 


(b)  Reference  image  R 


(c)  Sum  of  absolute  differences  (d)  Maximum  difference 


O 

(e)  Sum  of  squared  distances 

7 

O 

4F  ^0 

(g)  Normalized  cross  correlation 


(h)  Correlation  coefficient 


Shape  of  the  template 

The  shape  of  the  reference  image  does  not  need  to  be  rectangular  as 
in  the  previous  examples,  although  it  is  convenient  for  the  processing. 
In  some  applications,  circular,  elliptical,  or  custom-shaped  templates 
may  be  more  applicable  than  a  rectangle.  In  such  a  case,  the  template 


Original  reference  image  R 


(a)  Euclidean  distance  dE(r,  s)  (b)  Correlation  coefficient  CL(r,s ) 


Modified  reference  image  R'  =  R  +  50 


(c)  Euclidean  distance  dE(r,  s)  (d)  Correlation  coefficient  CL(r,s ) 


23.1  Template 
Matching  in  Intensity 
Images 


Fig.  23.4 

Effects  of  changing  global 
brightness.  Original  refer¬ 
ence  image  R:  the  results  from 
both  the  Euclidean  distance 
(a)  and  the  correlation  coeffi¬ 
cient  (b)  show  distinct  peaks 
at  the  positions  of  maximum 
agreement.  Modified  refer¬ 
ence  image  R'  =  R  +  50:  the 
peak  values  disappear  in  the 
Euclidean  distance  (c),  while 
the  correlation  coefficient  (d) 
remains  unaffected. 


Fig.  23.5 

Euclidean  distance  under 
global  intensity  changes.  Dis¬ 
tance  function  for  the  original 
template  R  (left),  with  the 
template  intensity  increased 
by  25  units  (center)  and  50 
units  (right).  Notice  that  the 
local  peaks  disappear  as  the 
template  intensity  (and  thus 
the  total  distance  between  the 
image  and  the  template)  is 
increased. 


(a) 


Fig.  23.6 

Detection  of  match  points  by 
simple  thresholding:  correla¬ 
tion  coefficient  (a),  positive 
values  only  (b),  and  values 
greater  than  0.5  (c).  The  re¬ 
maining  peaks  indicate  the 
positions  of  the  six  similar 
(but  not  identical)  tulip  pat¬ 
terns  in  the  original  image 
(Fig.  23.3(a)). 


may  still  be  stored  in  a  rectangular  array,  but  the  relevant  pixels  must 
somehow  be  marked  (e.g.,  using  a  binary  mask). 

Even  more  general  is  the  option  to  assign  individual  continuous 
weights  to  the  template  elements  such  that,  for  example,  the  center 
of  a  template  can  be  given  higher  significance  in  the  match  than  the 
peripheral  regions.  Implementing  such  a  “windowed  matching”  tech¬ 
nique  should  be  straightforward  and  require  only  minor  modifications 
to  the  standard  approach. 
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23  Image  Matching  and  23.1.2  Matching  Under  Rotation  and  Scaling 

Registration 

Correlation-based  matching  methods  applied  in  the  way  described 
in  this  section  cannot  handle  significant  rotation  or  scale  differences 
between  the  search  image  and  the  template.  One  obvious  way  to 
overcome  rotation  is  to  match  using  multiple  rotated  versions  of  the 
template,  of  course  at  the  price  of  additional  computation  time.  Sim¬ 
ilarly,  one  could  try  to  match  using  several  scaled  versions  of  the 
template  to  achieve  scale  independence  to  some  extent.  Although 
this  could  be  combined  by  using  a  set  of  rotated  and  scaled  template 
patterns,  the  combinatorially  growing  number  of  required  matching 
steps  could  soon  become  prohibitive  for  a  practical  implementation. 

An  interesting  technique  is  matching  in  logarithmic- polar  space, 
where  rotation  and  scaling  map  to  translations  and  can  thus  be  han¬ 
dled  with  correlation- type  methods  [267].  However,  this  requires  an 
initial  “anchor  point”,  which  again  needs  to  be  detected  in  a  rotation 
and  scale  invariant  way  [152,209,238].  Another  alternative  is  the 
popular  Lucas-Kanade  technique  for  elastic  local  matching,  which  is 
described  at  detail  in  Chapter  24.  In  principle,  given  an  approxi¬ 
mate  starting  solution,  this  method  cannot  only  handle  rotation  and 
scaling,  but  arbitrary  image  transformations  or  distortions. 

23.1.3  Java  Implementation 

Implementations  of  most  methods  described  in  this  chapter  are  openly 
available  as  part  of  the  imagingbook  library.3  As  an  example,  the 
code  listed  in  Progs.  23.1  and  23.2  demonstrates  the  use  of  the  Corr- 
Coeff Matcher  class  for  template  matching  based  on  the  local  corre¬ 
lation  coefficient  (Eqn.  (23.10)).  The  application  assumes  that  the 
search  image  (I)  and  the  reference  image  (R)  are  already  available 
as  objects  of  type  FloatProcessor.  They  are  used  to  create  a  new 
instance  of  class  CorrCoeff Matcher,  as  shown  in  the  following  code 
segment: 

FloatProcessor  I  =  .  .  .  //  search  image 

FloatProcessor  R  =  .  .  .  //  reference  image 

CorrCoef fMatcher  matcher  =  new  CorrCoef fMatcher  (I) ; 
f  loat  []  []  C  =  matcher  .getMatch(R) ; 

The  correlation  coefficient  is  computed  by  the  method  getMatchO 
and  returned  as  a  2D  float-array  (C). 


23.2  Matching  Binary  Images 

As  became  evident  in  the  previous  section,  the  comparison  of  inten¬ 
sity  images  based  on  correlation  may  not  be  an  optimal  solution  but 
is  sufficiently  reliable  and  efficient  under  certain  restrictions.  If  we 
compare  binary  images  in  the  same  way,  by  counting  the  number  of 
identical  pixels  in  the  search  image  and  the  template,  the  total  dif¬ 
ference  will  only  be  small  when  most  pixels  are  in  exact  agreement. 

Q 

Package  imagingbook . pub . mat  ching. 
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package  imagingbook . pub . mat  ching ; 
import  i j . process . FloatProcessor ; 
class  CorrCoef fMatcher  { 

private  final  FloatProcessor  I;  // search  image 


private  final  int  MI,  NI ; 

private  FloatProcessor  R; 
private  int  MR,  NR; 
private  int  K; 
private  double  meanR; 
private  double  varR; 

(o-r) 


//  width/height  of  search  image 

//  reference  image 
//  width/height  of  reference  image 

//  mean  value  of  reference  ( R ) 

//  square  root  of  reference  variance 


public  CorrCoef  fMatcher  (FloatProcessor  I)  {  //constructor 
this. I  =  I; 

this. MI  =  this  .  I .  getWidthO  ; 
this.NI  =  this . I . getHeight () ; 

} 

public  f  loat  []  []  getMatch  (FloatProcessor  R)  { 
this.R  =  R; 

this. MR  =  R. getWidthO  ; 
this. NR  =  R. getHeight () ; 
this.K  =  MR  *  NR; 

//  calculate  the  mean  ( R )  and  variance  term  (SR)  of  the  template: 
double  sumR  =  0;  //  UR  =  y]  R(i:  j ) 

double  sumR2  =  0;  //  UR2  —  R2 (z,  j) 

for  (int  j  =  0;  j  <  NR;  j++)  { 
for  (int  i  =  0;  i  <  MR;  i++)  { 
float  aR  =  R.getf(i,j); 
sumR  +=  aR; 


sumR2  +=  aR  *  aR; 


} 


} 


this. meanR  =  sumR  /  K; 
this.varR  = 

Math. sqrt (sumR2  -  K 


//R=[£R(i,j)]/K 

HSR  =  E  R2{i,j)  -  K-R2}1/2 

meanR  *  meanR) ; 


float  []  []  C  =  new  float  [MI  -  MR  +  1]  [NI  -  NR  +  1]  ; 
for  (int  r  =  0;  r  <=  MI  -  MR;  r++)  { 
for  (int  s  =  0;  s  <=  NI  -  NR;  s++)  { 
float  d  =  (float)  getMatchValue (r ,  s) ; 

C  [r]  [s]  =  d; 

} 

} 

return  C; 


} 


//  continued... 


Prog.  23.1 

Implementation  of  class 
CorrCoef fMatcher  (part  1/2). 
The  constructor  method  (lines 
16—20)  calculates  the  mean 
R  =  meanR  (Eqn.  (23.11))  and 
the  variance  SR  =  varR  (Eqn. 
(23.14))  of  the  reference  image 
R.  The  method  getMatch(R) 
(lines  22—51)  determines  the 
match  values  between  the 
search  image  /  and  the  refer¬ 
ence  image  R  f  for  all  positions 
(r,  s). 
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23  Image  Matching  and 
Registration 

Prog.  23.2 

Implementation  of  class 
CorrCoef fMatcher  (part 
2/2).  The  local  match  value 
C(r,s)  (see  Eqn.  (23.15)) 
at  the  individual  posi¬ 
tion  (r,  s)  is  calculated  by 
method  getMatchValue(r,s) 
(lines  54—72). 


54 

private  double 

getMatchValue (int  r,  int  s)  { 

55 

double  suml  = 

0;  II  Ej  =  Yll(r+i,s+j) 

56 

double  sumI2 

=  0;  //  EI2  =  J2(I(r+i,s+j))2 

57 

double  sumIR 

=  0;  //  EIR  —  /(r+z,  s+j)  •  R(i,j) 

58 

59 

for  (int  j  = 

0;  j  <  NR;  j++)  { 

60 

for  (int  i 

=  0;  i  <  MR;  i++)  { 

61 

float  al 

=  I .getf (r  +  i,  s  +  j) ; 

62 

float  aR 

=  R. getf (i ,  j) ; 

63 

suml  += 

al; 

64 

suml 2  += 

al  *  al ; 

65 

sumIR  += 

al  *  aR; 

66 

} 

67 

} 

68 

69 

double  meanl 

=  suml  /  K ;  II  Ir  s  —  Ej  /  K 

70 

return  (sumIR  -  K  *  meanl  *  meanR)  / 

71 

(1  +  Math. sqrt (sumI2  -  K  *  meanl  *  meanl)  *  varR) ; 

72 

} 

73 

74  } 

//  end  of  class  CorrCoef  fMatcher 

Since  there  is  no  continuous  transition  between  pixel  values,  the  dis¬ 
tribution  produced  by  a  simple  distance  function  will  generally  be 
ill-behaved  (i.e.,  highly  discontinuous  with  many  local  extrema;  see 
Fig.  23.7). 

23.2.1  Direct  Comparison  of  Binary  Images 

The  problem  with  directly  comparing  binary  images  is  that  even  the 
smallest  deviations  between  image  patterns,  such  as  those  caused  by 
a  small  shift,  rotation,  or  distortion,  can  create  very  high  distance 
values.  Shifting  a  thin  line  drawing  by  only  a  single  pixel,  for  exam¬ 
ple,  may  be  sufficient  to  switch  from  full  agreement  to  no  agreement 
at  all  (i.e.,  from  zero  difference  to  maximum  difference).  Thus  a  sim¬ 
ple  distance  function  gives  no  indication  how  far  away  and  in  which 
direction  to  search  for  a  better  match  position. 

An  interesting  question  is  how  matching  of  binary  images  can  be 
made  more  tolerant  against  small  differences  of  the  compared  pat¬ 
terns.  Thus  the  goal  is  not  only  to  detect  the  single  image  position, 
where  most  foreground  pixels  in  the  two  images  match  up,  but  also 
(if  possible)  to  obtain  a  measure  indicating  how  far  (in  terms  of  ge¬ 
ometry)  we  are  away  from  this  position. 

23.2.2  The  Distance  Transform 

A  first  step  in  this  direction  is  to  record  the  distance  to  the  closest 
foreground  pixel  for  every  position  (u,  v )  in  the  search  image  I.  This 
gives  us  the  minimum  distance  (though  not  the  direction)  for  shifting 
a  particular  pixel  onto  a  foreground  pixel.  Starting  from  a  binary 
image  I{u,v)  =  I(u ),  we  denote 
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Images 


y?  f£  C  2) 

n  ifi  \t> 

£  7  9  7f 

3 

7  S  jK  I 

i  Hi  ;fi  4'  l/‘ 

(a)  (b)  (c) 

FG(I)  =  {u  |  I(u)  =  1},  (23.16) 

BG(I)  =  {u  |  I(u)  =0},  (23.17) 


Fig.  23.7 

Direct  comparison  of  binary 
images.  Given  are  a  binary 
search  image  (a)  and  a  binary 
reference  image  (b).  The  local 
similarity  value  for  any  tem¬ 
plate  position  corresponds  to 
the  relative  number  of  match¬ 
ing  (black)  foreground  pix¬ 
els.  High  similarity  values  are 
shown  as  bright  spots  in  the 
result  (c).  While  the  maximum 
similarity  is  naturally  found 
at  the  correct  position  (at  the 
center  of  the  glyph  B )  the 
match  function  behaves  wildly, 
with  many  local  maxima. 


as  the  set  of  coordinates  of  the  foreground  and  background  pixels, 
respectively.  The  so-called  distance  transform  of  /,  D(u)  G  R,  is 
defined  as 

D (u)  :=  min  disUu,  id),  (23.18) 

u'eFG(i) 

for  all  u  =  (r,  v),  where  u  —  0, . . . ,  M— 1,  v  =  0, . . . ,  N— 1  (for  image 
size  M  x  TV).  The  value  D  at  a  given  position  u  thus  equals  the 
distance  between  u  and  the  nearest  foreground  pixel  in  /.  If  I{u)  is 
a  foreground  pixel  itself  (i.e.,  x  G  EG),  then  the  distance  D(u)  =  0 
since  no  shift  is  necessary  for  moving  this  pixel  onto  a  foreground 
pixel. 

The  function  dis in  Eqn.  (23.18)  measures  the  geometric 
distance  between  the  two  coordinate  points  u  =  (r,  v)  and  u'  = 
Examples  of  suitable  distance  functions  are  the  Euclidean 
distance  (L2  norm) 


dE(u,u')  = 


u  —  u 


=  \/ {u  —  u')2  +  (v  —  v')2  G  M+ 


and  the  Manhattan  distance 4  (Lx  norm) 


d  m(u,u')  = 


u  —  u 


+ 


v  —  V 


G  N0. 


(23.19) 


(23.20) 


Figure  23.8  shows  a  simple  example  of  a  distance  transform  using  the 
Manhattan  distance  dM(). 

The  direct  calculation  of  the  distance  transform  (following  the 
definition  in  Eqn.  (23.18))  is  computationally  expensive,  because  the 
closest  foreground  pixel  must  be  found  for  each  pixel  position  p  (un¬ 
less  I(p)  is  a  foreground  pixel  itself).5 


Chamfer  algorithm 

The  so-called  chamfer  algorithm  [30]  is  an  efficient  method  for  com¬ 
puting  the  distance  transform.  Similar  to  the  sequential  region  label¬ 
ing  algorithm  (see  Ch.  10,  Alg.  10.2),  the  chamfer  algorithm  traverses 

4  Also  called  “city  block  distance”. 

5  A  simple  (brute  force)  algorithm  for  the  distance  transform  would  per¬ 
form  a  full  scan  over  the  entire  image  for  each  processed  pixel,  resulting 
in  0(N2  •  TV2)  =  0(N4)  steps  for  an  image  of  size  N  x  N. 
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Distance  transform 


23  Image  Matching  and 
Registration 


Fig.  23.8 

Example  of  a  distance  trans¬ 
form  of  a  binary  image  us¬ 
ing  the  Manhattan  distance 
dM  ().  Foreground  pixels 
in  the  binary  image  have 
value  1  (shown  inverted). 


Binary  image 
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the  image  twice  by  propagating  the  computed  values  across  the  image 
like  a  wave.  The  first  traversal  starts  at  the  upper  left  corner  of  the 
image  and  propagates  the  distance  values  downward  in  a  diagonal 
direction.  The  second  traversal  proceeds  in  the  opposite  direction 
from  the  bottom  to  the  top.  For  each  traversal,  a  “distance  mask”  is 
used  for  the  propagation  of  the  distance  values;  that  is, 


ml  = 

m2  m1  m2 

m1  x 

and  Mr  = 

x  m1 

(23.21) 

. 

_  rn2  m1  rn2  _ 

for  the  first  and  second  traversals,  respectively.  The  values  in  ML 
and  MR  describe  the  geometric  distance  between  the  current  pixel 
(marked  x)  and  the  relevant  neighboring  pixels.  They  depend  upon 
the  distance  function  dist(x,x/)  used.  Algorithm  23.2  outlines  the 
chamfer  method  for  computing  the  distance  transform  D(u,v)  for  a 
binary  image  I(u,v)  using  the  above  distance  masks. 

For  the  Manhattan  distance,  the  chamfer  algorithm  computes  the 
distance  transform  (Eqn.  (23.20))  exactly  using  the  masks 


"  2 

1  2  " 

• 

• 

M^  = 

1 

x  • 

and  = 

•  X 

1 

• 

• 

2  1 

2 

(23.22) 


Similarly  for  the  Euclidean  distance  (Eqn.  (23.19))  can  be  calculated 
with  the  masks 


■y/2  1  yfi- 

. 

1  X  • 

and  Mr  = 

•  x  1 

. 

_y/2  1  V2_ 

(23.23) 


Note  that  the  result  obtained  with  these  masks  is  only  an  approxima¬ 
tion  of  the  Euclidean  distance  to  the  nearest  foreground  pixel,  which 
is  nevertheless  more  accurate  than  the  estimate  produced  by  the 
Manhattan  distance.  As  demonstrated  by  the  examples  in  Fig.  23.9, 
the  distances  obtained  with  the  Euclidean  masks  are  exact  along  the 
coordinate  axes  and  the  diagonals  but  are  overestimated  (i.e. ,  too 
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1: 

DistanceTransform(7,  norm) 

Input:  7,  a,  binary  image;  norm  G  {Ll5L2},  distance  function. 

Returns  the  distance  transform  of  7. 

Step  1:  initialize 

2: 

f  (1,  2)  for  norm  =  Lx 

(m i ,  mo)  X-  {  r- 

|(1,  \/2)  for  norm  —  L2 

3 

(M,  N)  <-  Size(7) 

4 

Create  map  D:  M  x 

5 

for  all  (u,v)  £  M  x  N  do 

6: 

f  0  for  I(u,  v)  >  0 

D(u,v)  ^  K  J 

[  oo  otherwise 

Step  2:  L— pass 

7 

for  v  <—  0,  •  • ' ,  N—  1  do 

>  top  — >•  bottom 

8 

for  u  <—  0,  •  •  ’ ,  M  —  1  do 

>  left  — )•  right 

9 

if  D(r,  v)  >  0  then 

10 

d\  ,  d2 ,  do, ,  g?4  4 —  oo 

11 

if  u  >  0  then 

12 

d1  <—  m1  +  D(r  —  1,  v) 

13 

if  v  >  0  then 

14 

d2  <—  rn2  +  D  (u  —  1,  v  —  1) 

15 

if  v  >  0  then 

16 

d3  <—  mi  +  D(r,  v  —  1) 

17 

if  u  <  M  —  1  then 

18 

o?4  4 —  m2  +  D(r  1,  v  —  1) 

19 

D  (u,v)  min(D  (u,v),  d1,  d2,  d3l  d4) 

Step  3:  R— pass 

20 

for  v  <—  N—  1, '  •  • ,  0  do 

>  bottom  -A  top 

21 

for  u  <—  M  —  1 ,  ‘  •  • ,  0  do 

>  right  -A  left 

22 

if  D(r,  v)  >  0  then 

23 

c?i ,  d2l  d3 ,  0?4  i —  oo 

24 

if  u  <  M—  1  then 

25 

d\  4 —  TTli  “h  D(r  -|-  1,  v) 

26 

if  v  <  N—l  then 

27 

d2  i —  m2  +  D(r  +  1,  v  +  1) 

28 

if  v  <  N—l  then 

29 

d3  4 —  rrii  +  D(r,  v  1) 

30 

if  u  >  0  then 

31 

0?4  4 —  m2  +  D(r  —  1,  v  -|-  1) 

32 

D (u,v)  <—  min(D(R,  v),  d1,  d2,  d3l  d4) 

33 

return  D 

23.2  Matching  Binary 
Images 

Alg.  23.2 

Chamfer  algorithm  for  com¬ 
puting  the  distance  transform 
From  the  binary  image  /,  the 
distance  transform  D  (Eqn. 
(23.18))  is  computed  using  a 
pair  of  distance  masks  (Eqn. 
(23.21))  for  the  first  and  sec¬ 
ond  passes.  Notice  that  the 
image  borders  require  special 
treatment. 


high)  for  all  other  directions.  A  more  precise  approximation  can  be 
obtained  with  distance  masks  of  greater  size  (e.g.,  5x5  pixels;  see 
Exercise  23.3),  which  include  the  exact  distances  to  pixels  in  a  larger 
neighborhood  [30].  Furthermore,  floating  point-operations  can  be 
avoided  by  using  distance  masks  with  scaled  integer  values,  such  as 
the  masks 


"4 

00 

• 

• 

3 

x  • 

and  M§,  = 

•  X 

3 

• 

• 

CO 

4 

(23.24) 
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Registration 


Fig.  23.9 

Distance  transform  with  the 
chamfer  algorithm:  original 
image  with  black  foreground 
pixels  (a),  and  results  of  dis¬ 
tance  transforms  using  the 
Manhattan  distance  (b)  and 
the  Euclidean  distance  (c). 

The  brightness  (scaled  to  max¬ 
imum  contrast)  corresponds 
to  the  estimated  distance  to 
the  nearest  foreground  pixel. 


Original  image  Manhattan  distance  Euclid,  distance  (approx.) 


for  the  Euclidean  distance.  Compared  with  the  original  masks  (Eqn. 
(23.23)),  the  resulting  distance  values  are  scaled  by  about  the  fac¬ 
tor  3. 

23.2.3  Chamfer  Matching 

The  chamfer  algorithm  offers  an  efficient  way  to  approximate  the  dis¬ 
tance  transform  for  a  binary  image  of  arbitrary  size.  The  next  step 
is  to  use  the  distance  transform  for  matching  binary  images.  Cham¬ 
fer  matching  (first  described  in  [19])  uses  the  distance  transform  to 
localize  the  points  of  maximum  agreement  between  a  binary  search 
image  I  and  a  binary  reference  image  (template)  R.  Instead  of  count¬ 
ing  the  overlapping  foreground  pixels  as  in  the  direct  approach  (see 
Sec.  23.2.1),  chamfer  matching  uses  the  accumulated  values  of  the 
distance  transform  as  the  match  score  Q.  At  each  position  (r,  s)  of 
the  template  R ,  the  distance  values  corresponding  to  all  foreground 
pixels  in  R  are  accumulated,  that  is, 

•  D{r  +  i,  s  +  j)  ,  (23. 

(bi)e 

FG(R) 

where  K  =  \FG(R)\  denotes  the  number  of  foreground  pixels  in  the 
template  R. 

The  complete  procedure  for  computing  the  match  score  Q  is  sum¬ 
marized  in  Alg.  23.3.  If  at  some  position  each  foreground  pixel  in  the 
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1:  ChamferMatch  (/,  R) 

Input:  /,  binary  search  image;  R ,  binary  reference  image. 
Returns  a  2D  map  of  match  scores. 

Step  1  -  initialize: 

2:  (Mj,IVj)  Size(J) 

3:  (MR,NR)  <—  Size(R) 

4:  D  <—  DistanceTransform(J)  >  Alg.  23.2 

5:  Create  map  Q :  (Mj  —  MR  +  1)  x  (Nj  —  NR  +  1)  i— >■  R 

Step  2  -  compute  match  function: 

6:  for  r  <—  0, . . . ,  M:  —  MR  do  >  place  R  at  (r,  s) 

7 :  for  s  i —  0, . . . ,  Nj  —  NR  do 

Get  match  score  for  R  placed  at  (r,  s) 

8:  q  0 

9:  n  <—  0  >  number  of  foreground  pixels  in  R 

10:  for  i  4 —  0, . . . ,  MR  —  1  do 

1 1 :  for  j  <—  0, . . . ,  Nr  —  1  do 

12:  if  R(i,j)  >  0  then  >  foreground  pixel  in  R 

13:  q  q  +  D(r  +  z,  s  +  j) 

14:  n  <—  n  +  1 

15:  Q(r,  s)  <—  q/n 

16:  return  Q 


template  R  coincides  with  a  foreground  pixel  in  the  image  /,  the 
sum  of  the  distance  values  is  zero,  which  indicates  a  perfect  match. 
The  more  foreground  pixels  of  the  template  fall  onto  distance  values 
greater  than  zero,  the  larger  is  the  resulting  score  value  Q  (sum  of 
distances).  The  best  match  is  found  at  the  global  minimum  of  Q, 
that  is, 

a^opt  =  Copt,  soPt)  =  argmin (Q(r,  s)).  (23.26) 

( r,s ) 

The  example  in  Fig.  23.10  demonstrates  the  difference  between 
direct  pixel  comparison  and  chamfer  matching  using  the  binary  im¬ 
age  shown  in  Fig.  23.7.  Obviously  the  match  score  produced  by  the 
chamfer  method  is  considerably  smoother  and  exhibits  only  a  few  dis¬ 
tinct  local  maxima.  This  is  of  great  advantage  because  it  facilitates 
the  detection  of  optimal  match  points  using  simple  local  search  meth¬ 
ods.  Figure  23.11  shows  another  example  with  circles  and  squares. 
The  circles  have  different  diameters  and  the  medium-sized  circle  is 
used  as  the  template.  As  this  example  illustrates,  chamfer  matching 
is  tolerant  against  small-scale  changes  between  the  search  image  and 
the  template  and  even  in  this  case  yields  a  smooth  score  function 
with  distinct  peaks. 

While  chamfer  matching  is  not  a  “silver  bullet”,  it  is  efficient  and 
works  sufficiently  well  if  the  applications  and  conditions  are  suitable. 
It  is  most  suited  for  matching  line  or  edge  images  where  the  percent¬ 
age  of  foreground  pixels  is  small,  such  as  for  registering  aerial  images 
or  aligning  wide-baseline  stereo  images.  The  method  tolerates  devi¬ 
ations  between  the  image  and  the  template  to  a  small  extent  but  is 
of  course  not  generally  invariant  under  scaling,  rotation,  and  defor¬ 
mation.  The  quality  of  the  results  deteriorates  quickly  when  images 
contain  random  noise  (“clutter”)  or  large  foreground  regions,  because 


23.2  Matching  Binary 
Images 

Alg.  23.3 

Chamfer  matching  (calcula¬ 
tion  of  the  match  function). 
Given  is  a  binary  search  im¬ 
age  I  and  a  binary  reference 
image  (template)  R.  In  step 
1,  the  distance  transform  D 
is  computed  for  the  image  I 
using  the  chamfer  algorithm 
(Alg.  23.2).  In  step  2,  the 
sum  of  distance  values  is  ac¬ 
cumulated  for  all  foreground 
pixels  in  template  R  for  each 
template  position  (r,  s).  The 
resulting  scores  are  stored  in 
the  2D  match  map  Q,  which  is 
returned. 
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Chamfer  matching 
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Fig.  23.10 

Direct  pixel  comparison  vs. 
chamfer  matching  (see  original 
images  in  Fig.  23.7).  Unlike 
the  results  of  the  direct  pixel 
comparison  (a),  the  chamfer 
match  score  Q  (b)  is  much 
smoother.  It  shows  distinct 
peak  values  in  places  of  high 
agreement  that  are  easy  to 
track  down  with  local  search 
methods.  The  match  score  Q 
(Eqn.  (23.25))  in  (b)  is  shown 
inverted  for  easy  comparison. 


Direct  comparison 


(a) 


9  9  9  9 
9  9  9  9 
9  0  0  9 


(b) 


the  method  is  based  on  minimizing  the  distances  to  foreground  pix¬ 
els.  One  way  to  reduce  the  probability  of  false  matches  is  not  to 
use  a  linear  summation  (as  in  Eqn.  (23.25))  but  add  up  the  squared 
distances,  that  is, 


rms 


r  1 

K 


^2(D(r  +  i,s  +  i)) 


2-1  1/2 


bb)e 

FG(R) 


(23.27) 


(“root  mean  square”  of  the  distances)  as  the  match  score  between 
the  template  R  and  the  current  subimage,  as  suggested  in  [30].  Also, 
hierarchical  variants  of  the  chamfer  method  have  been  proposed  to 
reduce  the  search  effort  as  well  as  to  increase  robustness  [31]. 


23.2.4  Java  Implementation 

The  calculation  of  the  distance  transform,  as  described  in  Alg.  23.2,  is 
implemented  by  the  class  DistanceTransf  orm.6  Program  23.3  shows 
the  complete  code  for  the  class  Chamf  erMatcher  for  comparing  binary 
images  with  the  distance  transform,  which  is  a  direct  implementation 
of  Alg.  23.3.  Additional  examples  (ImageJ  plugins)  can  be  found  in 
the  on-line  code  repository. 


Package  imagingbook . pub . matching. 
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Fig.  23.11 

Chamfer  matching  under  vary¬ 
ing  scales.  Binary  search  image 
with  three  circles  of  different 
diameters  and  three  identical 
squares  (a).  The  medium-sized 
circle  at  the  top  is  used  as 
the  template  (b).  The  result 
from  a  direct  pixel  compari¬ 
son  (c,  e)  and  the  result  from 
chamfer  matching  (d,  f).  Again 
the  chamfer  match  produces  a 
much  smoother  score,  which  is 
most  notable  in  the  3D  plots 
shown  in  the  bottom  row  (e, 
f).  Notice  that  the  three  cir¬ 
cles  and  the  squares  produce 
high  match  scores  with  similar 
absolute  values  (f). 


23.3  Exercises 

Exercise  23.1.  Implement  the  chamfer-matching  method  (Alg.  23.2) 
for  binary  images  using  the  Euclidean  distance  and  the  Manhattan 
distance. 

Exercise  23.2.  Implement  the  exact  Euclidean  distance  transform 
using  a  “brute-force”  search  for  each  closest  foreground  pixel  (this 
may  take  a  while  to  compute).  Compare  your  results  with  the  ap¬ 
proximation  obtained  with  the  chamfer  method  (Alg.  23.2),  and  com¬ 
pute  the  maximum  deviation  (as  percentage  of  the  real  distance). 

Exercise  23.3.  Modify  the  chamfer  algorithm  for  computing  the  dis¬ 
tance  transform  (Alg.  23.2)  by  replacing  the  3x3  pixel  Euclidean 
distance  masks  (Eqn.  (23.23))  with  the  following  masks  of  size  5x5: 
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1.000 
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2.236 
2.236  1.414 
1.000 


2.236 

1.414  2.236 


(23.28) 
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R 


2.236  1.414 
2.236 


x  1.000 
1.000  1.414  2.236 
2.236 


(23.29) 


Compare  the  results  with  those  obtained  with  the  standard  masks. 
Why  are  no  additional  mask  elements  required  along  the  coordinate 
axes  and  the  diagonals? 

Exercise  23.4.  Implement  the  chamfer-matching  technique  using  (a) 
the  linear  summation  of  distances  (Eqn.  (23.25))  and  (b)  the  sum¬ 
mation  of  squared  distances  (Eqn.  (23.27))  for  computing  the  match 
score.  Select  suitable  test  images  to  find  out  if  version  (b)  is  really 
more  robust  in  terms  of  reducing  the  number  of  false  matches. 

Exercise  23.5.  Adapt  the  template-matching  method  described  in 
Sec.  23.1  for  the  comparison  of  RGB  color  images. 
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1 

package  imagingbook . pub . mat  ching ; 

2 

import  i j . process . ByteProcessor ; 

3 

A 

import  imagingbook .pub .matching .DistanceTransf orm . Norm; 

Z ± 

5 

public  class  Chamf erMatcher  { 

6 

private  final  ByteProcessor  I; 

7 

private  final  int  MI,  NI; 

8 

9 

10 

private  final  f loat  []  []  D;  //  distance  transform  of  1 

public  Chamf erMatcher (ByteProcessor  I)  { 

11 

this(I,  Norm.L2); 

12 

} 

13 

14 

public  Chamf erMatcher (ByteProcessor  I,  Norm  norm) 

{ 

15 

this.I  =  I; 

16 

this. MI  =  this  .  I .  getWidthO  ; 

17 

this.NI  =  this . I . getHeight () ; 

18 

this.D  =  (new  DistanceTransf orm (I ,  norm)). 
getDistanceMap () ; 

19 

} 

20 

21 

public  f  loat  []  []  getMatch (ByteProcessor  R)  { 

22 

final  int  MR  =  R. getWidthO  ; 

23 

final  int  NR  =  R. getHeight () ; 

24 

final  int  []  []  Ra  =  R.  get  Int  Array  ()  ; 

25 

float  []  []  Q  =  new  float  [MI  -  MR  +  1]  [NI  -  NR  + 

l]; 

26 

for  (int  r  =  0;  r  <=  MI  -  MR;  r++)  { 

27 

for  (int  s  =  0;  s  <=  NI  -  NR;  s++)  { 

28 

float  q  =  getMatchValue (Ra,  r,  s) ; 

29 

Q  [r]  [s]  =  q; 

30 

} 

31 

} 

32 

return  Q; 

33 

} 

34 

35 

private  float  getMatchValue (int [] []  R,  int  r,  int 

s)  { 

36 

float  q  =  0 . Of ; 

37 

for  (int  i  =  0;  i  <  R. length;  i++)  { 

38 

for  (int  j  =  0;  j  <  R[i] . length;  j++)  { 

39 

if  (R [i]  [j]  >  0)  {  //foreground  pixel  in  reference  image 

40 

q  =  q  +  D[r  +  i]  [s  +  j]  ; 

41 

} 

42 

} 

43 

} 

44 

return  q; 

45 

} 

46 

} 

Prog.  23.3 

Java  implementation  of  Alg. 
23.3  (class  Chamf erMatcher). 
The  distance  transform  of 
the  binary  search  image  I  is 
calculated  in  the  constructor 
method  by  an  instance  of  class 
DistanceTransf orm  and  stored 
as  a  2D  float  array  (line  18). 
The  method  getMatch(R)  in 
lines  21—45  computes  the  2D 
match  function  Q  (again  as  a 
float  array)  for  the  reference 
image  R. 
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Non-Rigid  Image  Matching 


The  correlation-based  registration  methods  described  in  Chapter  23 
are  rigid  in  the  sense  that  they  provide  for  translation  as  the  only 
form  of  geometric  transformation  and  positioning  is  limited  to  whole 
pixel  units.  In  this  chapter  we  look  at  methods  that  are  capable 
of  registering  a  reference  image  under  (almost)  arbitrary  geometric 
transformations,  such  as  changes  in  rotation,  scale,  and  affine  distor¬ 
tion,  and  also  to  sub-pixel  accuracy. 

At  the  core  of  this  chapter  is  a  detailed  description  of  the  clas¬ 
sic  Lucas-Kanade  algorithm  [154]  and  its  efficient  implementation. 
Unlike  the  methods  presented  earlier,  the  algorithms  described  here 
typically  do  not  perform  a  global  search  over  the  entire  image  to  find 
the  best  match,  but  start  from  an  initial  estimate  of  the  geometric 
transformation  to  home  in  on  the  optimum  position  and  distortion 
in  an  iterative  fashion.  This  is  not  difficult,  for  example,  in  tracking 
applications,  where  the  approximate  location  of  a  particular  image 
patch  can  be  predicted  from  the  observed  motion  in  previous  frames. 
Of  course,  the  global  matching  methods  described  in  Chapter  23  can 
be  used  to  find  a  coarse  starting  solution. 


24.1  The  Lucas-Kanade  Technique 

The  basic  idea  of  the  Lucas-Kanade  technique  is  best  illustrated  in 
the  ID  case  (see  Fig.  24.1(a)). 

24.1.1  Registration  in  ID 

Given  two  ID,  real- valued  functions  /(#),  g(x),  the  registration  prob¬ 
lem  is  to  find  the  disparity  t  in  the  (horizontal)  ^-direction  under  the 
assumption  that  g  is  a  shifted  version  of  /,  that  is, 

g(x)  =  f(x  —  t).  (24.1) 

If  the  function  /  is  linear  in  a  (sufficiently  large)  neighborhood  of 
some  point  x  with  slope  /'(#),  then 
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Fig.  24.1 

Registering  two  ID  functions 
(figure  adapted  from  [154]). 
The  ID  function  g(x )  is  as¬ 
sumed  to  be  a  shifted  version 
of  f(x).  In  (a),  /  is  approx¬ 
imately  linear  at  position  x, 
with  slope  f'(x )  =  dy/dx. 
Under  this  condition,  the 
horizontal  displacement  t 
can  be  estimated  from  the 
difference  of  the  local  func¬ 
tion  values  f(x)  and  g(x)  as 
t  «  (/(£)  -  g(x))/f'(x).  In 
(b),  the  overall  displacement  t 
is  calculated  by  averaging  the 
individual  displacement  esti¬ 
mates  from  multiple  samples 
in  the  region  R  =  [xa,xh\. 


ry>  ryt 

a 


f{x  -t)K,  f(x)  -  t  ■  f(x)  (24.2) 

and  therefore 

g(x)  «  f(x)  -  t  ■  f(x).  (24.3) 

Thus,  given  the  function  values  f(x),  g(x)  and  the  first  derivative 
f(x)  at  some  point  x,  the  displacement  t  can  be  estimated  (from 
Eqn.  (24.2))  as 


t 


f{x)-g{x) 

f(x) 


(24.4) 


Note  that  this  can  be  viewed  as  a  first-order  Taylor  expansion1  of 
the  function  /.  Obviously,  the  estimate  of  the  shift  t  in  Eqn.  (24.4) 
depends  only  on  a  single  pair  of  function  samples  at  position  x  and 
fails  at  points  where  /  is  either  not  linear  or  flat,  that  is,  where  the 
first  derivative  f  vanishes.  To  obtain  a  more  robust  displacement 
estimate  it  appears  natural  to  extend  the  calculation  over  a  range 
R  of  sample  values,  thereby  aligning  a  complete  section  of  the  two 
functions  /  and  g  (see  Fig.  24.1(b)).  This  problem  can  be  formulated 
as  finding  the  displacement  t  that  minimizes  the  L2  distance  between 
the  two  functions  /  and  g  over  a  range  R ,  that  is,  finding  t  such  that 


£(t)  =E  I/G-d  “  9(x))2  l f(x)  -  t  •  f(x)  -  g(x)}2 


(24.5) 


x£R 


x£R 
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1  See  also  Sec.  C.3.2  in  the  Appendix. 


is  a  minimum.  This  can  be  accomplished  by  calculating  the  first  24.1  The  Lucas-Kanade 
derivative  of  the  aforementioned  expression  (with  respect  to  t)  and  Technique 
setting  it  equal  to  zero,  which  gives 


dS 

~dt 


2  ■  E  '  ITU  _  f'(x)  • 1  -  g(x) 

x^R 


By  solving  this  equation  the  optimal  shift  is  found  as 


[£[  ro] 


x£R 


■^2f(x)-[f(x)-g(x)[  • 

x^R 


(24.6) 


(24.7) 


Note  that  this  local  estimation  works  even  if  the  function  /  is  fiat 
at  some  positions  in  R ,  unless  f'(x)  is  zero  everywhere  R.  However, 
since  the  estimate  is  based  only  on  linear  (i.e.,  first-order)  prediction, 
the  estimate  is  generally  not  accurate.  For  this  purpose,  the  following 
iterative  optimization  scheme  is  proposed  in  [154],  which  is  really  the 
basis  of  the  Lucas-Kanade  algorithm.  With  =  £start  as  the  initial 
estimate  of  the  displacement  (which  may  be  zero),  t  is  successively 
updated  as 


t(k  i)  +  [^2[f(x)}2]  1-  Y  f'(x)  ■  [. f(x )  -  g(x) 


xGR 


xGR 


(24.8) 


for  k  =  1,2, ,  until  either  ti'k>  converges  or  a  maximum  number  of 
steps  is  reached. 


24.1.2  Extension  to  Multi-Dimensional  Functions 


As  shown  in  [154],  the  formulation  given  in  Sec.  24.1.1  can  be  easily 
generalized  to  align  multi-dimensional,  scalar-valued  functions,  in¬ 
cluding  2D  images.  In  general,  the  involved  functions  F(x)  and  G(x) 
are  now  defined  over  Mm,  and  thus  all  coordinates  x  =  (oq, . . . ,  xm) 
and  spatial  shifts  t  =  (tl5...,tm)  are  m-dimensional  column  vec¬ 
tors.  The  task  is,  analogous  to  Eqn.  (24.5),  to  find  the  vector  t  that 
minimizes  the  error  quantity 


£(t)  =  Y\-F(x~ 


t)  —  G(x) 


x^R 


(24.9) 


where  R  denotes  an  m-dimensional  region.  The  linear  approximation 
in  Eqn.  (24.2)  becomes 


F(x  —  t)  ~  F(x)  —  Vp(x)  •  t,  (24.10) 

where  the  row  vector  \/F{x)  =  ( Jj-(®),  •  •  • ,  is  the  m- dimen¬ 

sional  gradient  of  the  function  E,  evaluated  at  position  x.  Minimizing 
£(t)  over  t  is  again  accomplished  by  solving  =  0,  that  is  (analo¬ 
gous  to  Eqn.  (24.6)), 


2  •  Y  Yf(*)  •  [F(x)  -  VF(x)  ■  t  -  G(x) 

x  G-R 


0. 


(24.11) 


The  solution  to  Eqn.  (24.11)  is 
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VF(x)  •  VF(a?) 


x  ER 


[^2Vtf(x)-[F(x)-G(x) 

xER 


=  U-1-[Yj^tf(x)-[F(x)-G(x)} 

x^R 


(24.12) 

(24.13) 


where  HF  is  an  estimate  of  the  mxm  Hessian  matrix2  for  the  function 
F  over  the  region  R.  Note  the  similarity  of  Eqn.  (24.13)  to  the  ID 
version  in  Eqn.  (24.7). 


24.2  The  Lucas-Kanade  Algorithm 

Based  on  the  ideas  outlined  in  Sec.  24.1,  the  Lucas-Kanade  algo¬ 
rithm  [154]  is  not  only  capable  of  registering  2D  images  by  finding 
the  optimal  translation,  but  works  for  a  range  of  geometric  trans¬ 
formations  Tp  that  can  be  parameterized  by  a  n-dimensional  vector 
p.  Among  others,  this  includes  affine  and  projective  transformations 
(see  Ch.  21)  as  the  most  important  cases. 

The  same  mathematical  notation  is  used  as  in  Chapter  23,  that 
is,  I  denotes  the  search  image  and  R  is  the  (typically  smaller)  refer¬ 
ence  image.  The  placement  and  possible  distortion  of  the  matching 
image  patch  is  described  by  a  geometric  transformation  Tp  (cf.  Ch. 
21),  where  p  denotes  a  vector  of  transformation  parameters.  The 
goal  of  the  Lucas-Kanade  registration  algorithm  is  to  minimize  the 
expression 


£{p)  =  YJ[TTp{x))  -  R{x) 

x£R 


(24.14) 


with  respect  to  the  geometric  transformation  parameters  p,  where  I  is 
the  (search)  image,  R  is  the  reference  image  (template),  and  Tp(x)  is 
a  geometric  transformation  or  warp  function  with  parameters  p.  For 
example,  simple  2D  translation  is  described  by  the  transformation 


Tp(x)  =  x+p  = 


x  +  tS\ 


(24.15) 


where  x  =  {x,  y)T  and  p=  (£x,  ty)T.  The  task  of  the  alignment  process 
is  to  find  the  parameters  that  describe  how  to  warp  the  search  image 
/,  such  that  the  match  between  I  and  R  is  optimal  over  the  support 
region  R.  Figure  24.2  illustrates  the  corresponding  geometry. 

In  each  iteration,  the  Lucas-Kanade  algorithm  starts  with  an  es¬ 
timate  of  the  transformation  parameters  p  and  attempts  to  find  the 
parameter  increment  q  that  locally  minimizes  the  expression 


£(q)  =  ENW*))  -  R(x) 

x  ER 


(24.16) 


After  calculating  the  optimal  parameter  change  qopt,  the  parameter 
vector  p  is  updated  in  the  form 
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2  See  Sec.  C.2.6  in  the  Appendix  for  details. 


(24.17) 
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i 


Fig.  24.2 

Geometric  relations  in  the 
(forward)  Lucas-Kanade  reg¬ 
istration  algorithm.  I  denotes 
the  search  image  and  R  is  the 
reference  image.  The  map¬ 
ping  Tp  warps  the  reference 
image  R  from  the  original  po¬ 
sition  (centered  at  the  origin) 
to  Rr ,  with  p  being  the  initial 
parameter  estimate.  Match¬ 
ing  is  performed  between  the 
search  image  /  and  the  warped 
reference  image  R' .  Tp+q  is 
the  improved  warp;  the  op¬ 
timal  parameter  change  q  is 
estimated  in  each  iteration. 


until  the  process  converges.  Typically,  the  update  loop  is  terminated 
when  the  magnitude  of  the  change  vector  qopt  drops  below  a  prede¬ 
fined  threshold. 

The  expression  to  be  minimized  in  Eqn.  (24.16)  depends  on  the 
image  content  and  is  generally  nonlinear  with  respect  to  q.  A  locally 
linear  approximation  of  this  function  is  obtained  by  the  first-order 
Taylor  expansion  on  /,  that  is,3 

I(Tp+q(x))  ~  I(TP(X))  +  Vf(Tp(a;))  •  JTp(x)  (24.18) 

1X2  2R  nxl 

' - V - ' 

e  r 

where  the  2D  (column)  vector 

V/(  x)  =  (lx(x),Iy(x))  (24.19) 

is  the  gradient  of  the  image  I  at  some  position  x  and  JT  (cc)  de¬ 
notes  the  Jacobian  matrix4  of  the  warp  function  Tp,  also  evaluated 
at  position  x.  In  general,  the  Jacobian  of  a  2D  warp  function 


(24.20) 


with  n  parameters  p  =  (p0?Pij  •  •  •  iPn- i)T  is  a  2  x  n  matrix  function 


dR 


X,P 


dp  i 


dR 


y,p 


dp  i 


0*0 

(*) 


dR 


x,p 


dpn- 


y  ,p 


dp . 


(24.21) 


With  the  linear  approximation  in  Eqn.  (24.18),  the  original  minimiza¬ 
tion  problem  in  Eqn.  (24.14)  can  now  be  written  as 

3  In  some  of  the  following  equations,  we  distinguish  carefully  between 
row  and  column  vectors  and  the  dimensions  of  vectors  and  matrices  are 
explicitly  displayed  (in  underbraces)  to  avoid  possible  confusion. 

4  The  Jacobian  J  of  a  function  /  is  a  matrix  containing  the  first  partial 
derivatives  of  /,  that  is,  it  is  a  matrix  of  functions  (see  also  Sec.  C.2.1 
in  the  Appendix). 
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=  ^  [Hu)  +  V7(u)  •  JTp (u)  •  q  -  R(u)}\ 

u^R 


(24.22) 

(24.23) 


with  x  =  Tp(x).  Finding  the  parameters  q  that  give  the  smallest 
difference  £(q)  is  a  linear  least-squares  minimization  problem,  which 
can  be  solved  by  taking  the  first  partial  derivative  with  respect  to  <7, 
that  is, 


dd 

dq 

n  x  1 


E  lv^)  •  JNM)] T  N(0  +  W«)  •  Jr>)  •  q  -R(u) 


uCR 


1x2 


2  xn 


1x2 


2  xn 


n  x  1 


n  X  1 


CM 


(24.24) 


and  setting  it  equal  to  zero.5  Solving  the  resulting  equation  for  the 
unknown  q  yields  the  parameter  change  minimizing  Eqn.  (24.24)  as 


9opt  ^  1 


(24.25) 


where  H  is  an  estimate  of  the  Hessian  matrix  (see  Eqns.  (24.29)- 
(24.30)), 

a  =  EE(«)  •  n»]t  [ft(tt)  -  m]  =  EsTh  •  D(u) 

U&R'  8(u)£R"  '  D(u)  GM  ueK  (24.26) 

is  a  n-dimensional  column  vector,  and 


D(u)  =  R(u)  -  I(u) 


(24.27) 


is  the  resulting  (scalar-valued)  error  image.  s(u)  =  (s0(u), . . . , 
sn-i(u))  is  a  n-dimensional  row  vector,  with  each  element  corre¬ 
sponding  to  one  of  the  parameters  in  p.  The  2D  scalar  fields  formed 
by  the  individual  components  of  the  vector  field  s(n), 


5Ch  •  •  •  >  sn- 1  :  Mr  X  Nr  eA  R, 


(24.28) 


are  called  steepest  descent  images  for  the  current  transformation  pa¬ 
rameters  p .6  These  images  are  of  the  same  size  as  the  reference  image 
R.  Finally,  the  n  x  n  matrix 


H 


E[v/A)'Jrp(«) 
u€R  iN 


V/(w)-Jr  (u) 


2xn 


1x2 


V 


2xn 


(24.29) 


- v - 

//  X  1 


— v — 

1  Xn 


(  S(rt 


d2D 


E«T(w)  •  s{u) 


uGR 


dpt 


dp0  dp 


n  —  1 


(P)\ 


d2D 


\dpri_1  dp0 


ip) 


(24.30) 


5  Note  that  in  Eqn.  (24.24)  the  left  factor  inside  the  summation  is  a  n- 
dimensional  column  vector,  while  the  right  factor  is  a  scalar. 

6  The  value  sk(u )  indicates  the  optimal  change  of  parameter  pk  for  the 
individual  pixel  position  u  to  achieve  a  steepest-descent  optimization  of 
Eqn.  (24.23)  (see  [13,  Sec.  4.3]). 
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in  Eqn.  (24.25)  is  an  estimate  of  the  Hessian  matrix7  for  the  given  24.2  The  Lucas-Kanade 
transformation  parameters  p,  calculated  over  all  coordinates  x  of  the  Algorithm 
reference  image  R  (Eqn.  (24.29)). 

The  inverse  of  this  matrix  is  used  to  calculate  the  optimal  pa¬ 
rameter  change  qopt  in  Eqn.  (24.25).  A  better  alternative  to  this 
formulation  is  to  solve 


H  •  qopt  =  6P,  (24.31) 

for  qopt  as  the  unknown,  without  explicitly  calculating  H“  1.  This  is 
a  system  of  linear  equations  in  the  standard  form  A  •  x  =  b,  which 
is  numerically  more  stable  and  efficient  to  solve  than  Eqn.  (24. 25). 8 


24.2.1  Summary  of  the  Algorithm 

In  order  not  to  get  lost  after  this  (quite  mathematical)  presentation, 
let  us  recap  the  key  steps  of  the  Lucas-Kanade  method  in  a  more 
compact  form.  In  summary,  given  a  search  image  /,  a  reference  image 
R ,  a  geometric  transformation  Tp,  an  initial  parameter  estimate  pinit, 
and  the  convergence  limit  e,  the  Lucas-Kanade  algorithm  performs 
the  following  steps: 


A.  Initialize: 

1.  Calculate  the  gradient  V/(u)  of  the  search  image  I  for  all 
image  positions  u  E  /. 

2.  Initialize  the  transformation  parameters:  p  <—  pinit. 

B.  Repeat: 

3.  Calculate  the  warped  gradient  image  Vj(u)  =  V/(Tp(u)),  for 
each  position  u  E  R  (by  interpolation  of  V/). 

dT 

4.  Calculate  the  (2  x  n)  Jacobian  matrix  JTp(u)  =  -gf{u)  Of 
the  warp  function  Tp(x ),  for  each  position  u  E  R  and  the 
current  parameter  vector  p  (see  Eqn.  (24.21)). 

5.  Compute  the  n-dim.  row  vectors  su  =  Vj(u)  •  JT  (u),  for 
each  position  u  E  R  (see  Eqn.  (24.26)). 

6.  Compute  the  cumulative  nxn  Hessian  matrix  as  H  =  s7  • 

su  (see  Eqn.  (24.29)).  ueR 

7.  Calculate  the  error  image  D(x)  =  R(u)  —  /(Tp(u)),  for  each 
position  u  E  R  (by  interpolation  of  /,  see  Eqn.  (24.26)). 

8.  Compute  the  column  vector  Sp  =  •  D(u)  (see  Eqn. 

(24.26)).  ueR 

9.  Calculate  the  optimal  parameter  change  qopt  =  H-1  •  Sp  (see 
Eqn.  (24.25)). 

10.  Update  the  transformation  parameter:  p  p  +  qopt  (see 
Eqn.  (24.17)). 


Until 


9Lpt 


<  e. 


7  The  Hessian  matrix  of  a  n- variable,  real- valued  function  /  is  composed  of 
/’ s  second-order  partial  derivatives  (see  also  Sec.  C.2.6  in  the  Appendix). 
The  Hessian  matrix  H  is  always  symmetric. 

8  Moreover,  Eqn.  (24.31)  may  be  solvable  even  if  the  matrix  H  is  almost 
singular  and  thus  numerically  not  invertible  [160,  p.  164]. 
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24  Non-Rigid  Image 
Matching 

Alg.  24.1 

Lucas-Kanade  (“forward- 
additive”)  registration  algo¬ 
rithm.  The  origin  of  the  ref¬ 
erence  image  R  is  placed  at 
its  center.  The  gradient  of  the 
image  is  calculated  only  once 
(line  6),  but  interpolated  in 
every  iteration  (line  15).  Also, 
the  n  X  n  Hessian  matrix  H 
is  calculated  and  inverted  in 
every  iteration.  The  Jacobian 
of  the  warp  function  T  is  also 
evaluated  repeatedly  (line  16), 
though  this  is  not  an  expensive 
calculation,  at  least  for  affine 
warps  (lines  32-33).  Procedure 
Interpolate^,  x')  returns  the 
interpolated  value  of  the  image 
I  at  the  continuous  position 
x'  G  R2  (see  Ch.  22  for  details 
and  possible  implementations). 


2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15: 

16 

17 

18 

19 

20 
21 

22 

23 

24 

25 

26 

27 

28 


LucasKanadeForward(7,  R ,  T,  pinit,  e,  imax) 

Input:  7,  the  search  image;  7£,  the  reference  image;  T,  a  2D  warp 
function  that  maps  any  point  x  G  R2  to  some  point  x'  =  Tp(x ), 
with  transformation  parameters  p  =  (p0?  •  •  •  ,Pn- 1);  pinit,  initial 
estimate  of  the  warp  parameters;  e,  the  error  limit;  imax,  the 
maximum  number  of  iterations. 

Returns  the  modified  warp  parameter  vector  p  for  the  best  fit 
between  7  and  R,  or  nil  if  no  match  could  be  found. 

(Mr,  Nr )  <—  Size(R)  >  size  of  the  reference  image  R 

xc  <—  0.5  •  (Mr  —  1,  Nr  —  1)  >  center  of  R 


P  Pinit 

n  Length (p) 

6xdy)<-  Gradient(J) 
i  0 


do 


>  initial  transformation  parameters 

>  parameter  count 
D>  calculate  the  gradient  VI 

>  iteration  counter 

D>  main  loop 


i  <—  i  +  1 

H  <-  0 


n,n 


Sp  i  0n 


>  H  G  RnXn,  initialized  to  zero 
>  sp  G  Rn,  initialized  to  zero 
for  all  positions  u  G  ( MRxNR )  do 

x  <—  u  —  xc  >  position  w.r.t.  the  center  of  R 

x'  Tp(x)  >  warp  x  to  x'  by  transf.  Tp 

Estimate  the  gradient  of  I  at  the  warped  position  x': 

V  <—  (lnterpolate(7x,  x'),  lnterpolate(7y,  a/))  >  2D  row 

vector 

>  Jacobian  of  Tp  at  pos.  x 

>  s  is  a  column  vector  of  length  n 

>  outer  product,  H  is  of  size  nxn 
>  cumulate  the  Hessian  (Eq.  24.30) 


J  4—  Jacobian (Tp,x) 
s  <—  (V  •  J)T 
H  G-  s_  •  sT 

H^H  +  H 


d  <—  R(u)  —  Interpolate^,  x')  D>  pixel  difference  d  G  R 

Sp  i —  Sp  -|-  s  •  d 

qopt  H_1-  Sp  >  Eq.  24.17,  or  solve  H-gopt=<5p  (Eq.  24.31) 
P^P  +  <7opt 

while  (ll^opt 

if  i  <  zmax  then 


N  c)  A  (2  Vax) 


>  repeat  until  convergence 


return  p 


else 


return  nil 


29:  Gradient(7) 

Returns  the  gradient  of  7  as  a  pair  of  maps. 


CO 

0 

'  -1  0 

-2  0 

-1  0 

1 ' 
2 

1 

T— 1  |Q0 

'  -1  -2  -1  ' 
0  0  0 
12  1 

31: 

return  (7  *  Hx, 

I*Hy) 

32: 


Jacobian  (Tp,  x) 

Returns  the  2  x  n  Jacobian  matrix  of  the  2D  warp  function 
Tp(x)  =  (T^p(x),Ty^p(x))  with  parameters  p  =  (p0,...,pn_ 1) 
for  the  spatial  position  xGl2. 


dT, 


x.p 


3T, 


33: 


return 


dPt 

dr 


(*)  -T5^(*) 


OR 


x,p 


y  ,p 


dpi 
<9T„ 


dPn-. 


dp 


0 


(*)  -sFfR) 


ar 


y  ,p 


dPn~. 


(*) 

(*) 


>  see  Eq.  24.21 
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The  complete  specification  of  the  Lucas-Kanade  algorithm  (re¬ 
ferred  to  as  the  “forward-additive”  algorithm  in  [13])  is  given  in  Alg. 
24.1.  In  addition  to  the  two  images  I  and  R ,  the  procedure  requires 
the  assumed  type  of  the  geometric  transformation  T,  the  estimated 
initial  transformation  parameters  pinit,  a  convergence  limit  e  and  the 
maximum  number  of  iterations  zmax.  The  optimal  parameter  vector 
p  is  returned  or  nil  if  the  optimization  did  not  converge.  For  better 
numerical  stability,  the  origin  of  the  reference  image  R  is  placed  at  its 
center  xc  (see  line  3),  as  is  also  illustrated  in  Fig.  24.2.  The  algorithm 
shows  (unlike  the  just  given  summary)  that  it  is  sufficient  to  calculate 
the  Jacobian  J  (see  line  16)  and  the  Hessian  matrix  H  (see  line  18) 
only  for  the  current  position  (u)  in  the  reference  image,  which  im¬ 
plies  relatively  modest  storage  requirements.  Additional  instructions 
for  calculating  the  Jacobian  and  Hessian  matrices  for  specific  linear 
transformations  T  are  described  in  Sec.  24.4.  In  the  case  that  H 
cannot  be  inverted  (because  it  is  singular)  in  line  22,  the  algorithm 
could  either  stop  (and  return  nil)  or  continue  with  a  small  random 
perturbation  of  the  transformation  parameters  p. 

This  so-called  forward-additive  algorithm  performs  reliably  if  the 
assumed  type  of  geometric  transformation  is  correct  and  the  ini¬ 
tial  parameter  estimate  is  sufficiently  close  to  the  actual  parameters. 
However,  it  is  computationally  demanding  since  it  requires  repeated 
warping  of  the  gradient  image  and  the  Jacobian  JT  as  well  as  the 
Hessian  matrix  H  must  be  re-calculated  in  each  iteration.  Very  sim¬ 
ilar  results  at  greatly  improved  performance  are  obtained  with  the 
“inverse  compositional  algorithm”  described  in  Sec.  24.3. 


24.3  Inverse  Compositional  Algorithm 


This  algorithm,  described  in  [14],  exchanges  the  roles  of  the  search 
image  I  and  the  reference  image  R.  As  illustrated  in  Fig.  24.3,  the 
reference  image  R  remains  anchored  at  the  original  position,  while  the 
geometric  transformations  are  applied  to  (parts  of)  the  search  image 
I.  In  particular,  the  transformation  Tp  now  describes  the  mapping 
from  the  warped  image  I'  back  to  the  original  image  I.  The  advan¬ 
tage  of  this  algorithm  is  that  it  avoids  re-evaluating  the  Jacobian 
and  Hessian  matrices  in  every  iteration  while  exhibiting  convergence 
properties  similar  to  the  Lucas-Kanade  (forward-additive)  algorithm 
described  in  Sec.  24.2. 

In  this  algorithm,  the  expression  to  be  minimized  in  each  iteration 
is  (cf.  Eqn.  (24.16)) 


£(q)  =  £  [■ R(Tq(u ))  -  I(Tp(u)) 

u<ER 


(24.32) 


with  respect  to  the  parameter  change  <7,  producing  an  optimal  change 
vector  qopt.  Subsequently,  the  geometric  transformation  is  updated 
not  by  simply  adding  qopt  to  the  current  parameter  estimate  p  (as  in 
Eqn.  (24.17)),  but  by  concatenating  the  corresponding  warps  in  the 
form 


24.3  Inverse 

Compositional 

Algorithm 


Tp,{x)  =  (T-loTp)(x)  =  Tp(T~lt(  x)) 


(24.33) 
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24  Non-Rigid  Image 
Matching 


Fig.  24.3 

Geometry  of  the  inverse  com¬ 
positional  registration  algo¬ 
rithm.  I  denotes  the  search 
image  and  R  is  the  reference 
image.  The  geometric  trans¬ 
formation  Tp  warps  the  image 
Ip  back  to  the  original  search 
image  /,  with  p  being  the 
initial  parameter  estimate. 
Matching  is  performed  between 
the  (unwarped)  reference  im¬ 
age  R  and  the  warped  search 
image  Ip.  Note  that  the  ref¬ 
erence  image  R  always  remains 
anchored  at  the  origin.  In 
each  iteration,  the  incremen¬ 
tal  warp  Tq  (with  parameter 
vector  q)  is  estimated,  map¬ 
ping  the  image  Ip  to  image 
Ipr .  The  resulting  composite 
warp  Tp /  (mapping  Ipr  back 
to  I)  with  parameters  p  is 
obtained  by  concatenating  the 
transformations  T~x  and  Tp. 


where  o  denotes  the  concatenation  (successive  application)  of  trans¬ 
formations.  In  the  special  (but  frequent)  case  of  linear  geometric 
transformations,  the  concatenation  is  simply  accomplished  by  multi¬ 
plying  the  corresponding  transformation  matrices  Mp,  ,  that  is, 


M p' 


(24.34) 
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(see  also  Sec.  24.4.4).  Also  note  that  the  “incremental”  transforma¬ 
tion  Tq  ^  is  inverted  before  it  is  concatenated  with  the  current  warp 
Tp,  to  calculate  the  parameters  of  the  resulting  composite  warp  Tp, . 
Thus  the  geometric  transformation  T  must  be  invertible,  but  this  is 
again  no  problem  with  linear  (affine  or  projective)  warps. 

In  summary,  given  a  search  image  /,  a  reference  image  R,  a  geo¬ 
metric  transformation  Tp,  an  initial  parameter  estimate  pinit  and  the 
convergence  limit  e,  the  “inverse  compositional  algorithm”  performs 
the  following  steps: 


A.  Initialize: 

1.  Calculate  the  gradient  V#(sc)  of  the  reference  image  R  for  all 
x  G  R- 

2.  Calculate  the  Jacobian  J(x)  =  -^(x)  of  the  warp  function 
Tp(x)  for  all  x  G  R,  with  p  —  0. 

3.  Compute  sx  =  V R(x)  •  J(sc)  for  all  x  G  R. 

4.  Calculate  the  Hessian  matrix  as  H  =  J2r  sx  '  sx  and  pre¬ 
calculate  its  inverse  H-1. 

5.  Initialize  the  transformation  parameters:  p  <—  pinit. 


B.  Repeat: 

6.  Warp  the  search  image  I  to  /',  such  that  I'(x)  =  /(Tp(sc)), 
for  all  x  G  R. 


7.  Compute  the  (column)  vector  Sp  =  sx  •  [/'(sc)  —  R{x) 

8.  Estimate  the  optimal  parameter  change  qopt  =  H_1  •  Sp. 

9.  Find  the  warp  parameters  p',  such  that  Tp,  —  T~°  o  Tp. 

10.  Update  the  warp  parameter  p  <—  p' . 


Until 


*7opt 


<  e. 


2: 

3: 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34: 


LucasKanadelnverse(7 ,  R,  T,  pinit,  e,  imax) 

Input:  /,  the  search  image;  R ,  the  reference  image;  T,  a  2D 
warp  function  that  maps  any  point  x  £  R2  to  x'  =  Tp(x)  using 
parameters  p  =  ( p0 , . . . ,  pn_ i);  pinit,  initial  estimate  of  the  warp 
parameters;  e,  the  error  limit  (typ.  e  =  10— 3 ) ;  imax,  the  maximum 
number  of  iterations. 

Returns  the  updated  warp  parameter  vector  p  for  the  best  fit 
between  I  and  R,  or  nil  if  no  match  could  be  found. 

(Mr,  Nr )  <—  Size(R)  t>  size  of  the  reference  image  R 

xc  <—  0.5  •  (Mr  —  1,  Nr  —  1)  >  center  of  R 

Initialize: 

n  V-  Length  (p)  >  parameter  count  n 

Create  map  S:  (MR  x  NR )  ha  Rn  >  n  “steepest-descent  images” 

(Rx,Ry)  A-  Gradient(R)  >  (Rx(u),  Ry  (u))T  =  V R(u) 

H  A-  0n  n  >  initialize  n  x  n  Hessian  matrix  to  zero 

for  all  positions  u  £  (MR  x  NR)  do 
x  A-  u  —  xc 

V#  A-  Ry(U)) 

J  A-  Jacobian(T0(tc)) 
s^(VR-J)J 
S  (u)  A-  s 
H  <-  s_-  sJ 

H  ^  H  +  H 

H_1  A-  Inverse(H) 

>  H  could  not  be  inverted 

>  stop 

D>  initial  parameter  estimate 
>  iteration  counter 


>  centered  position 
>  2-dimensional  row  vector 

>  Jacob,  of  T  at  pos.  x  with  p  =  0 
D>  s  is  a  column  vector  of  length  n 

>  keep  s  for  later  use 
>  outer  product,  H  is  of  size  nxn 

>  cumulate  the  Hessian  (Eq.  24.30) 


if  H  1  =  nil  then 
return  nil 


P  P init 
i  A-  0 


Main  loop: 

do 

i  A-  i  +  1 
Sp  i  0n 


t>  Sp  £  Rn,  initialized  to  zero 


for  all  positions  u  £  (MR  x  NR)  do 

x  A-  u  —  xc  >  centered  position 

x'  A-  Tp(x)  >  warp  I  to  I' 

d  A-  Interpolate^,  x')  —  R(u)  >  pixel  difference  d  £  R 
s  A-  S  (u)  >  get  pre-calculated  s 

Sp  i —  Sp  s  •  d 

<7opt  H  1  -  dp 


p  A-  determine,  such  that  Tp/(x)  =  Tp(Tq^{x)) 


>  H"1  is  pre-calculated  in  line  16 

7-1 

^opt 


p^p 

while  (  qopt  >  e)  A  (i  <  imax) 

P  for  i  <  *max 
nil  otherwise 


>  repeat  until  convergence 


return 


24.3  Inverse 

Compositional 

Algorithm 

Alg.  24.2 

Inverse  compositional  registra¬ 
tion  algorithm.  The  gradient 
vectors  V R(u,  v )  of  the  refer¬ 
ence  image  R  are  calculated 
only  once  (line  6)  using  proce¬ 
dure  Gradient(),  as  defined  in 
Alg.  24.1.  The  Jacobian  matrix 
J  of  the  warp  function  Tp  is 
also  evaluated  only  once  (line 
11)  for  p  =  0  (i.e.,  the  identity 
mapping)  over  all  positions  of 
the  reference  image  R.  Sim¬ 
ilarly,  the  Hessian  matrix  H 
and  its  inverse  H-1  are  calcu¬ 
lated  only  once  (lines  15,  16). 
H-1  is  used  to  calculate  the 
optimal  parameter  change  vec¬ 
tor  qopt  in  line  30  of  the  main 
loop.  Procedure  lnterpolate() 
in  line  27  is  the  same  as  in 
Alg.  24.1.  This  algorithm  is 
typically  about  5—10  times 
faster  than  the  original  Lucas- 
Kanade  (forward)  algorithm 
(see  Alg.  24.1),  with  similar 
convergence  properties. 


One  can  see  clearly  that  in  this  variant  several  steps  are  performed 
only  once  at  initialization  and  do  not  appear  inside  the  main  loop.  A 
detailed  and  concise  listing  of  the  inverse  compositional  algorithm  is 
given  in  Alg.  24.2  and  concrete  setups  for  various  linear  transforma- 
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24  Non-Rigid  Image  tions  are  described  in  Sec.  24.4.  Since  the  Jacobian  matrix  (for  the 
Matching  null  parameter  vector  p  =  0)  and  the  Hessian  matrix  are  calculated 
only  once  during  initialization,  this  algorithm  executes  significantly 
faster  than  the  original  Lucas-Kanade  (forward-additive)  algorithm, 
while  offering  similar  convergence  properties. 


24.4  Parameter  Setups  for  Various  Linear 
Transformations 

The  use  of  linear  transformatons  for  the  geometric  mapping  T  is  very 
common.  In  the  following,  we  describe  detailed  setups  required  for 
the  Lucas-Kanade  algorithm  for  various  geometric  transformations, 
such  as  pure  translation  as  well  as  affine  and  projective  transfor¬ 
mations.  This  should  help  to  reduce  the  chance  of  confusion  about 
the  content  and  structure  of  the  involved  vectors  and  matrices.  For 
additional  details  and  concrete  implementations  of  these  transforma¬ 
tions  readers  should  consult  the  associated  Java  source  code  in  the 
imagingbook9  library. 


24.4.1  Pure  Translation 


In  the  case  of  pure  2D  translation,  we  have  n  =  2  parameters  £x,  ty 
and  the  geometric  transformation  is  (see  Eqn.  (24.15)) 

X  =  Tp(x)  =x  +  rA  ,  (24.35) 

with  the  parameter  vector  p  =  (po,P\)T  =  (tx,ty)T  and  x  =  (x,y)T . 
Thus  the  two  component  functions  of  the  transformation  (cf.  Eqn. 
(24.18))  are 


TXtP(x)  x  T  tx, 
-^y V  T  ^y, 


with  the  2x2  Jacobian  matrix 


(24.36) 


(24.37) 


Note  that  in  this  case  JT  (x)  is  constant,10  that  is,  independent  of 
the  position  x  and  the  parameters  p.  The  2D  column  vector  Sp  (Eqn. 
(24.26))  is  calculated  as 


=  E[VHW)  •  Jt>)]T  -  I{Tp{u))\ 


u<ER 


ueR*  D(u)e  m 

^[(4(w),/y(G)  •  (o  l)JT  D(U)  =  E  \TMl) 

r-  D  V  ^  r-  D  \  y  '  ' 


u£R 


s(u)  =  (s0(u),s1(u)) 


u^R 


(24.38) 

D(u) 

(24.39) 


(Eu  4(«)  •  D(u)\  _  (Eu  s0 («)  •  D(u)\  _  ( <50\  ,  . 

V£u  IyW  •  D(u)J  ~  Sl(n)  •  D(u)J  ~  [Sj  ’  ^4-4Uj 


9  Package  imagingbook. pub. geometry .mappings. 

10  I2  denotes  the  2x2  identity  matrix. 
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where  7x,/y  denote  the  (estimated)  first  derivatives  of  the  search 
image  I  in  x  and  ^-direction,  respectively.11  Thus  in  this  case  the 
steepest  descent  images  (Eqn.  (24.28))  s0(x)  =  Ix(x )  and  s1(x)  = 
Iy(x)  are  simply  the  components  of  the  interpolated  gradient  of  I 
in  the  region  of  the  shifted  reference  image.  The  associated  Hessian 
matrix  (Eqn.  (24.29))  is  calculated  as 
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again  with  u  =  Tp(u).  Since  H  is  symmetric  ( H(n  =  H10)  and  only 
of  size  2  x  2,  its  inverse  can  be  easily  obtained  in  closed  form: 
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The  resulting  optimal  parameter  increment  (see  Eqn.  (24.25))  is 
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(24.48) 
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with  <50 ,  as  defined  in  Eqn.  (24.40).  Alternatively  the  same  result 
could  be  obtained  by  solving  Eqn.  (24.31)  for  qopt . 


24.4.2  Affine  Transformation 

An  affine  transformation  in  2D  can  be  expressed  (for  example)  with 
homogeneous  coordinates12  in  the  form 


(24.50) 


with  n  =  6  parameters  p  =  (po?  •  •  •  =  (°S  fr?  C  d^tx^ty)J .  This 

parameterization  of  the  affine  transformation  implies  that  the  null 

11  See  Sec.  C.3.1  in  the  Appendix  for  how  to  estimate  gradients  of  discrete 
images. 

12  See  also  Chapter  21,  Secs.  21.1.2  and  21.1.3. 
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24  Non-Rigid  Image  parameter  vector  (p  =  0)  corresponds  to  the  identity  transformation. 
Matching  The  component  functions  of  this  transformation  thus  are 


TXjP(®)  —  (1  +  a)  •  x  -1-  b  •  y  +  tx, 
Ty^p(x)  =  c-  x  +  (l  +  (i),^  +  ty, 

and  the  associated  Jacobian  matrix  at  some  position  cc 


(24.51) 
(x,y)  is 
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(24.52) 

(24.53) 


Note  that  in  this  case  the  Jacobian  only  depends  on  the  position 
x  =  (./:.  y),  not  on  the  transformation  parameters  p.  It  can  thus  be 
pre-calculated  once  for  all  positions  x  of  the  reference  image  R.  The 
6-dimensional  column  vector  Sp  (Eqn.  (24.26))  is  obtained  as 
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(24.55) 
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(24.57) 


again  with  u  =  Tp(u).  The  corresponding  Hessian  matrix  (of  size 
6x6)  is  found  as 
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Finally,  the  optimal  parameter  increment  (see  Eqn.  (24.25))  is  calcu¬ 
lated  as 


<?opt  =  (a' ,c' ,d' ,t'x,t'yY  =H.  1-8p  (24.61) 

or,  equivalently,  by  solving  H  •  qopt  =  Sp  (see  Eqn.  (24.31)).  For 
both  approaches,  no  closed-form  solution  is  possible  but  numerical 
methods  must  be  used. 


24.4.3  Projective  Transformation 

A  projective  transformation13  can  be  expressed  (for  example)  with 
homogeneous  coordinates  in  the  form 

(1  +  a  b  £x\  /  x\ 

c  1  +  d  ty  )  •  |  y  )  ,  (24.62) 

e  /  i/  W 


with  n  =  8  parameters  p  =  (p0,  •  •  •  ,£>7)  =  (<T  <7  d,  e,  /,  tx,  ty).  Again 

the  null  parameter  vector  corresponds  to  the  identity  transforma¬ 
tion.  In  this  case,  the  results  need  to  be  converted  back  to  non- 
homogeneous  coordinates  (see  Ch.  21,  Sec.  21.1.2),  which  yields  the 
transformation’s  effective  (nonlinear)  component  functions 
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In  this  case,  the  associated  Jacobian  matrix  for  position  x  =  (x,  y), 
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depends  on  both  the  position  cc  as  well  as  the  transformation  param¬ 
eters  p.  The  setup  for  the  resulting  Hessian  matrix  H  is  analogous 
to  Eqns.  (24.58)-(24.61). 


24.4.4  Concatenating  Linear  Transformations 

The  “inverse  compositional”  algorithm  described  in  Sec.  24.3  requires 
the  concatenation  of  geometric  transformations  (see  Eqn.  (24.33)).  In 
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13 


See  also  Chapter  21,  Sec.  21.1.4. 
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24  Non-Rigid  Image  particular,  if  Tp,Tq  are  linear  transformations  (in  homogeneous  co- 
Matching  ordinates,  see  Eqn.  (24.62)),  with  associated  transformation  matrices 
Mp  and  Mq  (such  that  Tp(x)  =  Mp  •  x  and  Tq(x)  =  Mq  •  cc,  respec¬ 
tively),  the  matrix  for  the  concatenated  transformation, 

Tp'{x)  =  (Tp  o  Tq)(x)  =  Tq(Tp(x))  (24.69) 

is  simply  the  product  of  the  original  matrices,  that  is, 

•  x  =  Mq  •  Mp  •  x.  (24.70) 

The  resulting  parameter  vector  p'  for  the  composite  transformation 
Tp'  can  be  simply  extracted  from  the  corresponding  elements  of  the 
matrix  Mp/  (see  Eqn.  (24.50)  and  Eqn.  (24.62)),  respectively. 


24.5  Example 

Figure  24.4  shows  an  example  for  using  the  classic  Lucas-Kanade 
(forward-additive)  matcher.  Initially,  a  rectangular  region  Q  is  se¬ 
lected  in  the  search  image  /,  marked  by  the  green  rectangle  in  Fig. 
24.4(a,b),  which  specifies  the  approximate  position  of  the  reference 
image.  To  create  the  (synthetic)  reference  image  R ,  all  four  corners 
of  the  rectangle  Q  were  perturbed  randomly  in  x-  and  ^-direction  by 
Gaussian  noise  (with  a  =  2.5)  in  x-  and  ^/-direction.  The  resulting 
quadrilateral  Q'  (red  outline  in  Fig.  24.4(a,b))  specifies  the  region  in 
image  I  where  the  reference  image  R  was  extracted  by  transforma¬ 
tion  and  interpolation  (see  Fig.  24.4(d)).  The  matching  process  starts 
from  the  rectangle  Q,  which  specifies  the  initial  warp  transformation 
Tinit,  given  by  the  green  rectangle  (Q),  while  the  real  (but  unknown) 
transformation  corresponds  to  the  red  quadrilateral  ( Q' ).  Each  iter¬ 
ation  of  the  matcher  updates  the  warp  transformation  T.  The  blue 
circles  in  Fig.  24.4(b)  mark  the  corners  of  the  back-projected  refer¬ 
ence  frame  under  the  changing  transformation  T;  the  radius  of  the 
circles  corresponds  to  the  remaining  registration  error  between  the 
reference  image  R  and  the  current  subimage  of  I. 

Figure  24.4(e)  shows  the  steepest-descent  images  sQ,...,s7  (see 
Eqn.  (24.28))  for  the  first  iteration.  Each  of  these  images  is  of 
the  same  size  as  R  and  corresponds  to  one  of  the  8  parameters 
a,  5,  c,  d,  e,  /,  tx,  ty  of  the  projective  warp  transformation  (see  Eqn. 
(24.62)).  The  value  sk(u,v)  in  a  particular  image  sk  corresponds  to 
the  optimal  change  of  the  transformation  parameter  k  with  respect  to 
the  associated  image  position  (iq  v).  The  actual  change  of  parameter 
k  is  calculated  by  averaging  over  all  positions  (iq  v)  of  the  reference 
image  R. 

The  example  demonstrates  the  robustness  and  fast  convergence 
of  the  classic  Lucas-Kanade  matcher,  which  typically  requires  only 
5-20  iterations.  In  this  case,  the  matcher  performed  7  iterations 
to  converge  (with  convergence  limit  e  =  0.00001).  In  comparison, 
the  inverse-compositional  matcher  typically  requires  more  iterations 
and  is  less  tolerant  to  deviations  of  the  initial  warp  transformation, 
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s0  (param.  a )  s2  (param.  b )  s2  (param.  c)  s3  (param.  d) 


s4  (param.  e)  s5  (param.  /)  s6  (param.  tx)  s7  (param.  ty ) 

(e)  Steepest  descent  images  s0,  .  .  .  ,  s 7  (for  parameters  a,  b,  .  .  .  ,  tx,  ty) 
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Fig.  24.4 

Lucas-Kanade  (forward- 
additive)  matcher  with  pro¬ 
jective  warp  transformation. 
Original  image  I  (ab);  the  ini¬ 
tial  warp  transformation  Tinit 
is  visualized  by  the  green  rect¬ 
angle  Q,  which  corresponds 
to  the  subimage  shown  in  (c). 
The  actual  reference  image 
R  (d)  has  been  extracted 
from  the  red  quadrilateral 
Q'  (by  transformation  and 
interpolation).  The  blue  cir¬ 
cles  mark  the  corners  of  the 
back-projected  reference  image 
under  the  changing  trans¬ 
formation  Tp.  The  radius 
of  each  circle  corresponds 
to  the  registration  error  be¬ 
tween  the  transformed  ref¬ 
erence  image  R  and  the  cur¬ 
rently  overlapping  part  of  the 
search  image  I .  The  steepest- 
descent  images  s0, .  .  .  ,  s 7  (one 
for  each  of  the  8  parameters 
a,  b,  c,  d ,  e,  /,  tx ,  ty  of  the  pro¬ 
jective  transformation)  for  the 
first  iteration  are  shown  in  (e). 
These  images  are  of  the  same 
size  as  the  reference  image  R. 


that  is,  has  a  smaller  convergence  range  than  the  additive-forward 
algorithm.14 


24.6  Java  Implementation 

The  algorithms  described  in  this  chapter  have  been  implemented  in 
Java,  with  the  source  code  available  as  part  of  the  imagingbook15 
library  on  the  book’s  accompanying  website.  As  usual,  most  Java 
variables  and  methods  in  the  online  code  have  been  named  similarly 
to  the  identifiers  used  in  the  text  for  easier  understanding. 


14  In  fact,  the  inverse-compositional  algorithm  does  not  converge  with  this 
particular  example. 

15  Package  imagingbook . pub . lucaskanade. 
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24  Non-Rigid  Image  LucasKanadeMatcher  (class) 

Matching 

This  is  the  (abstract)  super-class  of  the  concrete  matchers  (For- 
wardAdditiveMatcher,  InverseCompositionalMatcher)  described 
further.  It  defines  a  static  inner  class  Parameters16  with  public  pa¬ 
rameter  fields  such  as 

tolerance  (=  e,  default  0.00001), 
maxlterations  (=  zmax,  default  100). 

In  addition,  class  LucasKanadeMatcher  itself  provides  the  following 
public  methods: 

Linear Mapping  getMatch  (Proj ectiveMapping  T) 

Performs  a  complete  match  on  the  given  image  pair  I,  R  (re¬ 
quired  by  the  sub-class  constructors),  with  T  used  as  the  ini¬ 
tial  geometric  transformation.  The  transformation  object  T 
may  be  of  any  subtype  of  Proj  ectiveMapping,17  including 
Translation  and  Af f ineMapping.  The  method  returns  a 
new  transformation  object  for  the  optimal  match,  or  null  if 
the  matcher  did  not  converge. 

Proj ectiveMapping  iterateOnce  (Proj ectiveMapping  T) 

This  method  performs  a  single  matching  iteration  with  the 
current  warp  transformation  T.  It  is  typically  invoked  repeat¬ 
edly  after  an  initial  call  to  initializeMatchO .  The  updated 
warp  transformation  is  returned,  or  null  if  the  iteration  was 
unsuccessful  (e.g.,  if  the  Hessian  matrix  could  not  be  inverted). 

boolean  hasConverged  () 

Returns  true  if  (and  only  if)  the  minimization  criteria  (spec¬ 
ified  by  the  tolerance  parameter)  have  been  reached.  This 
method  is  typically  used  to  terminate  the  optimization  loop 
after  calling  iterateOnce () . 

Point2D[]  getRef erencePoints  () 

Returns  the  four  corner  points  of  the  bounding  rectangle  of  the 
reference  image  R,  centered  at  the  origin.  All  warp  transfor¬ 
mations  (including  Tinit  and  Tp)  refer  to  these  coordinates. 
Note  that  the  returned  point  coordinates  are  generally  non¬ 
integer  values;  for  example,  for  a  reference  image  size  11x8, 
the  reference  corner  points  are  A  =  (—5,  —3.5),  B  =  (5,  —3.5), 
C  =  (5,3.5),  and  D  =  (—5,3.5)  (see  Fig.  24.5). 

Proj ectiveMapping  getRef erenceMappingTo  (Point2D[]  Q) 

Calculates  the  (linear)  geometric  transformation  between  the 
reference  image  R  (centered  at  the  origin)  and  the  quadrilateral 
specified  by  the  point  sequence  Q.  The  type  of  the  returned 
mapping  depends  on  the  number  of  points  in  Q  (max.  4). 

double  getRmsError  () 

Returns  the  RMS  error  between  images  I  and  R  for  the  most 
recent  iteration  (usually  called  after  iterateOnce () ). 


1 

See  the  usage  example  in  Prog.  24.1. 

Class  Proj  ectiveMapping  is  described  in  Chapter  21,  Sec.  21.1.4. 
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xc 

0123456789  10  wR  =  11 


■  Absolute  coordinate  origin 


24.6  Java 
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Fig.  24.5 

Reference  coordinates.  The 
center  of  the  reference  image 
R  is  aligned  with  the  origin 
of  the  search  image  I  (red 
square),  which  is  taken  as  the 
absolute  origin.  Image  samples 
(indicated  by  round  dots)  are 
assumed  to  be  located  at  inte¬ 
ger  positions.  In  this  example, 
the  reference  image  R  is  of  size 
wR  =  11  and  hR  =  8,  thus  the 
center  coordinates  are  xc  —  5.0 
and  yc  =  3.5.  In  the  x/y  coor¬ 
dinate  frame  of  I  (i.e.,  absolute 
coordinates),  the  four  corners 
of  R1  s  bounding  rectangle  are 
A  =  (-5,  -3.5),  B  =  (5,  -3.5), 
C  =  (5,  3.5)  and  D  =  (-5,  3.5). 
All  warp  transformations  refer 
to  these  reference  points  (cf. 
Figs.  24.2  and  24.3). 


LucasKanadeForwardMatcher  (class) 

This  sub-class  of  LucasKanadeMatcher  implements  the  Lucas-Kanade 
(“forward-additive”)  algorithm,  as  outlined  in  Alg.  24.1.  It  provides 
the  aforementioned  methods  for  LucasKanadeMatcher  and  two  con¬ 
structors: 

LucasKanadeForwardMatcher  (FloatProcessor  I, 
FloatProcessor  R) 

Here  I  is  the  search  image,  R  is  the  (smaller)  reference  image. 
It  creates  a  new  instance  of  LucasKanadeForwardMatcher  us¬ 
ing  default  parameter  values. 

LucasKanadeForwardMatcher  (FloatProcessor  I, 
FloatProcessor  R,  Parameters  params) 

Creates  a  new  instance  of  type  LucasKanadeForwardMatcher 
using  the  specific  settings  in  params. 

LucasKanadelnverseMatcher  (class) 

This  sub-class  of  LucasKanadeMatcher  implements  the  “inverse  com¬ 
positional”  algorithm,  as  described  in  Alg.  24.2.  It  provides  the  same 
methods  and  constructors  as  class  LucasKanadeForwardMatcher: 
LucasKanadelnverseMatcher  (FloatProcessor  I, 
FloatProcessor  R) . 

LucasKanadelnverseMatcher  (FloatProcessor  I, 
FloatProcessor  R,  Parameters  params). 

24.6.1  Application  Example 

The  code  example  in  Prog.  24.1  demonstrates  the  use  of  the  Lucas- 
Kanade  API.  The  ImageJ  plugin  is  applied  to  the  search  image  I 
(the  current  image)  and  requires  a  rectangular  ROI  to  be  selected, 
which  is  taken  as  the  initial  guess  for  the  match  region.  The  refer¬ 
ence  image  is  created  synthetically  by  extracting  a  warped  sub-image 
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24  Non-Rigid  Image 
Matching 

Prog.  24.1 

Lucas-Kanade  code  example 
(Image J  plugin).  This  plugin 
is  applied  to  the  search  image 
(/)  and  assumes  that  a  rect¬ 
angular  ROI  is  selected  whose 
bounding  rectangle  and  cor¬ 
ner  points  (Q)  are  obtained  in 
lines  22—27.  The  search  image 
I  is  copied  from  the  current 
image  (as  a  FloatProcessor 
object)  in  line  19.  The  size  of 
the  reference  image  R  (created 
in  line  24)  is  defined  by  the 
ROI  rectangle,  whose  corner 
points  Q  also  determine  the 
initial  parameters  of  the  geo¬ 
metric  transformation  Tinit 
(line  27  and  37,  respectively). 
The  synthetic  reference  image 
R  (with  the  same  size  as  the 
ROI)  is  extracted  from  the 
search  image  by  warping  from 
a  quadrilateral  (QQ),  which  is 
obtained  by  randomly  per¬ 
turbing  the  corner  points  of 
the  selected  ROI  (lines  28— 
29).  A  new  matcher  object 
is  created  in  lines  32—33,  in 
this  case  of  type  LucasKanade- 
ForwardMatcher  (alternatively, 
LucasKanade Inver seMatcher 
could  have  been  used).  The 
actual  match  operation  is  per¬ 
formed  in  lines  40—44.  It  con¬ 
sists  of  a  simple  do-while  loop 
which  is  terminated  if  either, 
the  transformation  T  becomes 
invalid  (null),  the  matcher 
has  converged  or  the  maxi¬ 
mum  number  of  iterations  has 
been  reached.  Alternatively, 
lines  40—44  could  have  been 
replaced  by  the  statement  T 
=  matcher . getMatch(Tinit)  . 

If  the  matcher  has  con¬ 
verged,  the  final  transfor¬ 
mation  Tp  maps  to  the  best¬ 
matching  sub-image  of  I. 


1  import  . . . 

2 

3  public  class  LucasKanade_Demo  implements  PluglnFilter  { 

4 

5  static  int  maxlterations  =  100; 

6 

7  public  int  setup (String  args ,  ImagePlus  img)  { 

8  return  D0ES_8G  +  R0I_REQUIRED ; 

9  } 

10 

11  public  void  run(ImageProcessor  ip)  { 

12  Roi  roi  =  img.getRoi () ; 

13  if  (roi  !=  null  &&  roi . getType ()  !=  Roi . RECTANGLE)  { 

14  IJ . error ("Rectangular  selection  required!)"); 

15  return; 

16  } 

17 

18  //  Step  1 :  create  the  search  image  /: 

19  FloatProcessor  I  =  ip . convertToFloatProcessor () ; 

20 

21  //  Step  2:  create  the  (empty)  reference  image  R\ 

22  Rectangle  roiR  =  roi . getBounds () ; 

23  FloatProcessor  R  = 

24  new  FloatProcessor (roiR. width,  roiR. height ) ; 

25 

26  //  Step  3:  perturb  the  rectangle  Q  to  Q'  to  extract  reference  image  R\ 

27  Point2D  []  Q  =  getCornerPoints  (roiR)  ;  II  =  Q 

28  Point2D  []  QQ  =  perturbGaussian  (Q)  ;  II  —  Q' 

29  (new  ImageExtractor (I) ). extract Image (R,  QQ) ; 

30 

31  //  Step  4:  create  the  Lucas-Kanade  matcher  (forward  or  inverse): 

32  LucasKanadeMatcher  matcher  = 

33  new  LucasKanadeForwardMatcher (I ,  R)  ; 

34 

35  //  Step  5:  calculate  the  initial  mapping  T|nit: 

36  Project iveMapping  Tinit  = 

37  matcher . getRef erenceMappingTo (Q) ; 

38 

39  //  Step  6:  initialize  and  run  the  matching  loop: 

40  Project iveMapping  T  =  Tinit; 

41  do  { 

42  T  =  matcher . iterateOnce (T) ; 

43  }  while  (T  !=  null  &&  ! matcher  .hasConvergedO  && 

44  matcher . get Iteration ()  <  maxlterations); 

45 

46  //  Step  7:  evaluate  the  result: 

47  if  (T  ==  null  ||  ! matcher  .hasConvergedO )  { 

48  IJ.logC'no  match  found!"); 

49  return; 

50  } 

51  else  { 

52  Project iveMapping  Tfinal  =  T; 

53 

54  } 

55 

56  } 
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of  I  from  a  random  quadrilateral  around  the  selected  ROI.18  The 
required  geometric  transformations  (such  as  ProjectiveMapping, 
Af f ineMapping,  Translation  etc.)  are  described  in  Chapter  21, 
Sec.  21.1. 

The  example  demonstrates  how  the  Lucas-Kanade  matcher  is  ini¬ 
tialized  and  called  repeatedly  inside  the  optimization  loop  using  a 
projective  transformation.  This  usage  mode  is  specifically  intended 
for  testing  purposes,  since  it  allows  to  retrieve  the  state  of  the  matcher 
after  every  iteration.  The  same  result  could  be  obtained  by  replacing 
the  whole  loop  (lines  40-44  in  Prog.  24.1)  with  the  single  instruction 

ProjectiveMapping  T  =  matcher .getMatch(Tinit) ; 

Moreover,  in  line  33,  the  LucasKanadeForwardMatcher  could  be  re¬ 
placed  by  an  instance  of  LucasKanadelnverseMatcher  without  any 
additional  changes.  For  further  details,  see  the  complete  source  code 
on  the  book’s  website. 


24.7  Exercises 

Exercise  24.1.  Determine  the  general  structure  of  the  Hessian  ma¬ 
trix  for  the  projective  transformation  (see  Sec.  24.4.3),  analogous  to 
the  affine  transformation  in  Eqns.  (24.58)-(24.60). 

Exercise  24.2.  Create  comparative  statistics  of  the  convergence  prop¬ 
erties  of  the  classes  ForwardAdditiveMatcher  and  InverseCompo- 
sitionalMatcher  by  evaluating  the  number  of  iterations  required 
including  the  percentage  of  failures.  Use  a  test  scenario  with  ran¬ 
domly  perturbed  reference  regions  as  shown  in  Prog.  24.1. 

Exercise  24.3.  It  is  sometimes  suggested  to  refine  the  warp  transfor¬ 
mation  step-by-step  instead  of  using  the  full  transformation  for  the 
whole  matching  process.  For  example,  one  could  first  match  with 
a  pure  translation  model,  then — starting  from  the  result  of  the  first 
match — switch  to  an  affine  transformation  model,  and  eventually  ap¬ 
ply  a  full  projective  transformation.  Explore  this  idea  and  find  out 
whether  this  can  yield  a  more  robust  matching  process. 

Exercise  24.4.  Adapt  the  2D  Lucas-Kanade  method  described  in 
Sec.  24.2  for  the  registration  of  discrete  ID  signals  under  shifting 
and  scaling.  Given  is  a  search  signal  7(r),  for  u  =  0, . . . ,  Mj  —  1, 
and  a  reference  signal  R(u),  for  u  =  0, . . . ,  MR  —  1.  It  is  assumed 
that  I  contains  a  transformed  version  of  R,  which  is  specified  by  the 
mapping  Tp(x)  =  s  •  x  +  t,  with  the  two  unknown  parameters  p  = 
(s,  t).  A  practical  application  could  be  the  registration  of  neighboring 
image  lines  under  perspective  distortion. 

Exercise  24.5.  Use  the  Lucas-Kanade  matcher  to  design  a  tracker 
that  follows  a  given  reference  patch  through  a  sequence  of  N  images. 
Hint:  In  ImageJ,  an  image  sequence  (AVI- video  or  multi- frame  TIFF) 
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18 


The  class  ImageExtractor,  used  to  extract  the  warped  sub-image,  is 
part  of  the  imagingbook  library  (package  imagingbook.  lib .  image). 
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24  Non-Rigid  Image  can  be  imported  as  an  ImageStack  and  simply  processed  frame-by- 
Matching  frame.  Select  the  original  reference  patch  in  the  first  frame  of  the 
image  sequence  and  use  its  position  to  calculate  the  initial  warp  trans¬ 
formation  to  find  a  match  in  the  second  image.  Subsequently,  take 
the  match  obtained  in  the  second  image  as  the  initial  transformation 
for  the  third  image,  etc.  Consider  two  approaches:  (a)  use  the  initial 
patch  as  the  reference  image  for  all  frames  of  the  sequence  or  (b) 
extract  a  new  reference  image  for  each  pair  of  frames. 
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Scale-Invariant  Feature  Transform 
(SIFT) 


Many  real  applications  require  the  localization  of  reference  positions 
in  one  or  more  images,  for  example,  for  image  alignment,  removing 
distortions,  object  tracking,  3D  reconstruction,  etc.  We  have  seen 
that  corner  points1  can  be  located  quite  reliably  and  independent 
of  orientation.  However,  typical  corner  detectors  only  provide  the 
position  and  strength  of  each  candidate  point,  they  do  not  provide 
any  information  about  its  characteristic  or  “identity”  that  could  be 
used  for  matching.  Another  limitation  is  that  most  corner  detectors 
only  operate  at  a  particular  scale  or  resolution,  since  they  are  based 
on  a  rigid  set  of  filters. 

This  chapter  describes  the  Scale- Invariant  Feature  Transform  (SIFT) 
technique  for  local  feature  detection,  which  was  originally  proposed 
by  D.  Lowe  [152]  and  has  since  become  a  “workhorse”  method  in 
the  imaging  industry.  Its  goal  is  to  locate  image  features  that  can 
be  identified  robustly  to  facilitate  matching  in  multiple  images  and 
image  sequences  as  well  as  object  recognition  under  different  view¬ 
ing  conditions.  SIFT  employs  the  concept  of  “scale  space”  [151]  to 
capture  features  at  multiple  scale  levels  or  image  resolutions,  which 
not  only  increases  the  number  of  available  features  but  also  makes 
the  method  highly  tolerant  to  scale  changes.  This  makes  it  possible, 
for  example,  to  track  features  on  objects  that  move  towards  the  cam¬ 
era  and  thereby  change  their  scale  continuously  or  to  stitch  together 
images  taken  with  widely  different  zoom  settings. 

Accelerated  variants  of  the  SIFT  algorithm  have  been  implemented 
by  streamlining  the  scale  space  calculation  and  feature  detection  or 
the  use  of  GPU  hardware  [20,90,218]. 

In  principle,  SIFT  works  like  a  multi-scale  corner  detector  with 
sub-pixel  positioning  accuracy  and  a  rotation-invariant  feature  de¬ 
scriptor  attached  to  each  candidate  point.  This  (typically  128-dimen¬ 
sional)  feature  descriptor  summarizes  the  distribution  of  the  gradient 
directions  in  a  spatial  neighborhood  around  the  corresponding  fea¬ 
ture  point  and  can  thus  be  used  like  a  “fingerprint”.  The  main  steps 
involved  in  the  calculation  of  SIFT  features  are  as  follows: 

1  See  Chapter  7. 
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1.  Extrema  detection  in  a  Laplacian-of-Gaussian  (LoG)  scale  space 
to  locate  potential  interest  points. 

2.  Key  point  refinement  by  fitting  a  continuous  model  to  determine 
precise  location  and  scale. 

3.  Orientation  assignment  by  the  dominant  orientation  of  the  fea¬ 
ture  point  from  the  directions  of  the  surrounding  image  gradients. 

4.  Formation  of  the  feature  descriptor  by  normalizing  the  local  gra¬ 
dient  histogram. 

These  steps  are  all  described  in  the  remaining  parts  of  this  chapter. 
There  are  several  reasons  why  we  explain  the  SIFT  technique  here 
at  such  great  detail.  For  one,  it  is  by  far  the  most  complex  algo¬ 
rithm  that  we  have  looked  at  so  far,  its  individual  steps  are  carefully 
designed  and  delicately  interdependent,  with  numerous  parameters 
that  need  to  be  considered.  A  good  understanding  of  the  inner  work¬ 
ings  and  limitations  is  thus  important  for  successful  use  as  well  as  for 
analyzing  problems  if  the  results  are  not  as  expected. 


25.1  Interest  Points  at  Multiple  Scales 

The  first  step  in  detecting  interest  points  is  to  find  locations  with 
stable  features  that  can  be  localized  under  a  wide  range  of  viewing 
conditions  and  different  scales.  In  the  SIFT  approach,  interest  point 
detection  is  based  on  Laplacian-of-Gaussian  (LoG)  filters,  which  re¬ 
spond  primarily  to  distinct  bright  blobs  surrounded  by  darker  regions, 
or  vice  versa.  Unlike  the  Liters  used  in  popular  corner  detectors,2 
LoG  Liters  are  isotropic ,  i.e.,  insensitive  to  orientation.  To  locate 
interest  points  over  multiple  scales,  a  scale  space  representation  of 
the  input  image  is  constructed  by  recursively  smoothing  the  image 
with  a  sequence  of  small  Gaussian  Liters.  The  difference  between  the 
images  in  adjacent  scale  layers  is  used  to  approximate  the  LoG  Liter 
at  each  scale.  Interest  points  are  Lnally  selected  by  Lnding  the  local 
maxima  in  the  3D  LoG  scale  space. 

25.1.1  The  LoG  Filter 

In  this  section,  we  Lrst  outline  LoG  Liters  and  the  basic  construc¬ 
tion  of  a  Gaussian  scale  space,  followed  by  a  detailed  description 
of  the  actual  implementation  and  the  parameters  used  in  the  SIFT 
approach. 

The  LoG  is  a  so-called  center- surround  operator,  which  most 
strongly  responds  to  isolated  local  intensity  peaks,  edge,  and  corner- 
like  image  structures.  The  corresponding  Liter  kernel  is  based  on  the 
second  derivative  of  the  Gaussian  function,  as  illustrated  in  Fig.  25.1 
for  the  ID  case.  The  ID  Gaussian  function  of  width  a  is  dehned  as 


v27 t  •  a 

and  its  first  derivative  is 
2  See  Chapter  7. 
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Analogously,  the  second  derivative  of  the  ID  Gaussian  is 
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The  Laplacian  (denoted  V2)  of  a  continuous,  2D  function  f(x,y) 
is  defined  as  the  sum  of  the  second  partial  derivatives  for  the  x-  and 
//-directions,  traditionally  written  as 


d2f 


d2f 


(v2/)  (x,y)  =  -7^(x,y)  +  -^{x,y) 


dy‘ 


(25.4) 


Note  that,  unlike  the  gradient 3  of  a  2D  function,  the  result  of  the 
Laplacian  is  not  a  vector  but  a  scalar  quantity.  Its  value  is  invariant 
against  rotations  of  the  coordinate  system,  that  is,  the  Laplacian 
operator  has  the  important  property  of  being  isotropic. 

By  applying  the  Laplacian  operator  to  a  rotationally  symmetric 
2D  Gaussian, 
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with  identical  widths  a  =  crx  =  ay  in  the  x/y  directions  (see  Fig. 
25.2(a)),  we  obtain  the  LoG  function 
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as  shown  in  Fig.  25.2(b).  The  continuous  LoG  function  in  Eqn.  (25.6) 
has  the  absolute  value  integral 
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See  Chapter  6,  Sec.  6.2.1. 


25.1  Interest  Points  at 
Multiple  Scales 

Fig.  25.1 

ID  Gaussian  function  Ga(x) 
with  cr  =  l  (black),  its  first 
derivative  G'cr{x)  (green)  and 
second  derivative  G'^{x)  (blue). 
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Fig.  25.2 

2D  Gaussian  and  LoG.  Gaus¬ 
sian  function  Ga(x,y)  with 
cr  =  1  (a);  the  correspond¬ 
ing  LoG  function  La(x,y)  in 
(b),  and  the  inverted  function 
(“Mexican  hat”  or  “Sombrero” 
kernel)  — LCT(cc,y)  in  (c).  For 
illustration,  all  three  functions 
are  normalized  to  an  abso¬ 
lute  value  of  1  at  the  origin. 


(a)  Ga(x,y) 


(b)  La(x,  y) 


(c)  —La  (x,  y) 


and  zero  average,  that  is, 


La(x,y)  dxdy 


(25.8) 


When  used  as  the  kernel  of  a  linear  filter,4  the  LoG  responds  max- 
imally  to  circular  spots  that  are  darker  than  the  surrounding  back¬ 
ground  and  have  a  radius  of  approximately  cr.5  Blobs  that  are  brighter 
than  the  surrounding  background  are  enhanced  by  filtering  with  the 
negative  LoG  kernel,  that  is,  —  Lal  which  is  often  referred  to  as  the 
“Mexican  hat”  or  “Sombrero”  filter  (see  Fig.  25.2).  Both  types  of 
blobs  can  be  detected  simultaneously  by  simply  taking  the  absolute 
value  of  the  filter  response  (see  Fig.  25.3). 

Since  the  LoG  function  is  based  on  derivatives,  its  magnitude 
strongly  depends  on  the  steepness  of  the  Gaussian  slope,  which  is 
controlled  by  cr.  To  obtain  responses  of  comparable  magnitude  over 
multiple  scales,  a  scale  normalized  LoG  kernel  can  be  defined  in  the 
form  [151] 


La(x,  y)  =  a2  ■  (V2GCT)  (x,  y)  =  a2  ■  La(x,  y) 

1  ex2  +  y2  —  2cr 
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2  ev 


7 TCP 


2cr2 


(25.9) 

(25.10) 


4  To  produce  a  sufficiently  accurate  discrete  LoG  filter  kernel,  the  support 
radius  should  be  set  to  at  least  4cr  (kernel  diameter  >  8cr). 

5  The  LoG  is  often  used  as  a  model  for  early  processes  in  biological  vision 
systems  [161],  particularly  to  describe  the  center-surround  response  of 
receptive  fields.  In  this  model,  an  “on-center”  cell  is  stimulated  when 
the  center  of  its  receptive  field  is  exposed  to  light,  and  is  inhibited  when 
light  falls  on  its  surround.  Conversely,  an  “off-center”  cell  is  stimulated 
by  light  falling  on  its  surround.  Thus  filtering  with  the  original  LoG  La 
(Eqn.  (25.6))  corresponds  to  the  behavior  of  off-center  cells,  while  the 
response  to  the  negative  LoG  kernel  —  La  is  that  of  an  on-center  cell. 
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Fig.  25.3 

Filtering  with  the  LoG  kernel 
(with  <j  =  3).  Original  im¬ 
ages  (a).  A  linear  filter  with 
the  LoG  kernel  La(x,y)  re¬ 
sponds  strongest  to  dark  spots 
in  a  bright  surround  (b),  while 
the  inverted  kernel  —  La(x,y) 
responds  strongest  to  bright 
spots  in  a  dark  surround  (c). 

In  (b,  c),  zero  values  are  shown 
as  medium  gray,  negative  val¬ 
ues  are  dark,  positive  values 
are  bright.  The  absolute  value 
of  (b)  or  (c)  combines  the  re¬ 
sponses  from  both  dark  and 
bright  spots  (d). 


Note  that  the  integral  of  this  function, 


4 

e 


(25.11) 


is  constant  and  thus  (unlike  Eqn.  (25.7))  independent  of  the  scale 
parameter  a  (see  Fig.  25.4). 


Approximating  the  LoG  by  the  difference  of  two  Gaussians 
(DoG) 

Although  the  LoG  is  “quasi-separable”  [113,243]  and  can  thus  be 
calculated  efficiently,  the  most  common  method  for  implementing 
the  LoG  filter  is  to  approximate  it  by  the  difference  of  two  Gaussians 
(DoG)  of  widths  a  and  ncr,  respectively,  that  is, 
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Fig.  25.4 

Normalization  of  the  LoG 
function.  Cross  section  of 
LoG  function  La(x,y)  as 
defined  in  Eqn.  (25.6)  (a); 

scale-normalized  LoG  (b) 
as  defined  in  Eqn.  (25.10). 
<j  —  1.0  (black),  <j  —  \/2 
(green),  cr  —  2.0  (blue).  All 
three  functions  in  (b)  have 
the  same  absolute  value  in¬ 
tegral  that  is  independent 
of  cr  (see  Eqn.  (25.11)). 


LCT(a:,0) 


La(x,  0) 


with  the  parameter  n  >  1  specifying  the  relative  width  of  the  two 
Gaussians  (defined  in  Eqn.  (25.5)).  Properly  scaled  (by  some  factor 
A,  see  Eqn.  (25.13)),  the  DOG  function  Da^(x,y)  approximates  the 
LoG  function  La(x,y)  in  Eqn.  (25.6)  with  arbitrary  precision,  as  n 
approaches  1  (k  =  1  being  excluded,  of  course).  In  practice,  values 
of  k  in  the  range  1.1, . . . ,  1.3  yield  sufficiently  accurate  results.  As  an 
example,  Fig.  25.5  shows  the  cross-section  of  the  2D  DoG  function 
for  k  =  21/3  «  1.25992. 6 


Fig.  25.5 

Approximating  the  LoG  by 
the  DoG.  The  two  origi¬ 
nal  Gaussians,  Ga(x )  with 
cra  =  1.0  and  Gb(x )  with 

CR  =  K  K  =  2l/3, 
shown  by  the  green  and  blue 
curves,  respectively  (a).  The 
red  curve  in  (a)  shows  the 
DoG  function  Da  K(x,y)  = 
Gb(x,  y )  -  Ga(x,  y)  for 
y  —  0.  In  (b),  the  dashed 
line  shows  the  reference  LoG 
function  in  comparison  to 
the  DoG  (red).  The  DoG  is 
scaled  to  match  the  magni¬ 
tude  of  the  LoG  function. 


(b) 


The  factor  A  G  M  in  Eqn.  (25.12)  controls  the  magnitude  of  the 
DoG  function;  it  depends  on  both  the  ratio  k  and  the  scale  parameter 
cr.  To  match  the  magnitude  of  the  original  LoG  (Eqn.  (25.6))  at  the 
origin,  it  must  be  set  to 


(25.13) 


Similarly,  the  scale-normalized  LoG  La  (Eqn.  (25.10))  can  be  approx¬ 
imated  by  the  DoG  function  Da  K  (Eqn.  (25.12))  as 


La(x,y)  =  <r2La(x,y) 

?2-\-Da,K(x,y) 
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Da>K{x,y),  (25.14) 


6  The  factor  n  —  21//3  originates  from  splitting  the  scale  interval  2  (i.e. , 
one  scale  octave)  into  3  equal  intervals,  as  described  later  on.  Another 
factor  mentioned  frequently  in  the  literature  is  1.6,  which,  however,  does 
not  yield  a  satisfactory  approximation.  Possibly  that  value  refers  to  the 
ratio  of  the  variances  g\/g\  and  not  the  ratio  of  the  standard  deviations 
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Q T,  y,  cr) 

continuous  Gaussian  scale  space 

G  =  (Go?  •  •  •  5  G k-i) 

discrete  Gaussian  scale  space  with  K  levels 

Gfc 

single  level  in  a  discrete  Gaussian  scale  space 

L  =  To  5  •  •  •  5  \-K-l  ) 

discrete  LoG  scale  space  with  K  levels 

Lfc 

single  level  in  a  LoG  scale  space 

D  =  (D0,  .  .  .  ,  Djc-i) 

discrete  DoG  scale  space  with  K  levels 

Dfc 

single  level  in  a  DoG  scale  space 

G  =  (G0,  .  .  •  ,  GP_i) 

hierarchical  Gaussian  scale  space  with  P  octaves 

G p  (GP;o?  •  ■  •  i  G p,Q- 

octave  in  a  hier.  Gaussian  scale  space  with  Q  levels 

G  p,q 

single  level  in  a  hierarchical  Gaussian  scale  space 

G  =  (D0,  •  •  •  ,  Gp_x ) 

hierarchical  DoG  scale  space  with  P  octaves 

Gp  (GpjO?  •  •  •  j  G PtQ~ 

_j_)  octave  in  a  hierarchical  DoG  scale  space  with  Q  levels 

single  level  in  a  hierarchical  DoG  scale  space 

Nc(u  J,  k) 

3x3x3  neigborhood  in  DoG  scale  space 

k  =  (p,  q,  u,  v ) 

discrete  key  point  position  in  hierarchical  scale  space 
o,  q,u,v  e  Z) 

k'  =  (p,  q,  x ,  y) 

continuous  (refined)  key  point  position  (x,  y  E  R) 

Table  25.1 

Scale  space-related  symbols 
used  in  this  chapter. 


with  the  factor  A  =  a2-X  =  2 r2/(r2  —  1)  being  constant  and  there¬ 
fore  independent  of  the  scale  a.  Thus,  as  pointed  out  in  [153],  with 
a  fixed  scale  increment  r,  the  DoG  already  approximates  the  scale- 
normalized  LoG  up  to  a  constant  factor,  and  thus  no  additional  scal¬ 
ing  is  required  to  compare  the  magnitudes  of  the  DoG  responses 
obtained  at  different  scales.* 7 

In  the  SIFT  approach,  the  DoG  is  used  as  an  approximation  of  the 
(scale-normalized)  LoG  Liter  at  multiple  scales,  based  on  a  Gaussian 
scale  space  representation  of  the  input  image  that  is  described  next.8 

25.1.2  Gaussian  Scale  Space 

The  concept  of  scale  space  [150]  is  motivated  by  the  observation  that 
real-world  scenes  exhibit  relevant  image  features  over  a  large  range  of 
sizes  and,  depending  on  the  particular  viewing  situation,  at  various 
different  scales.  To  relate  image  structures  at  different  and  unknown 
sizes,  it  is  useful  to  represent  the  images  simultaneously  at  different 
scale  levels.  The  scale  space  representation  of  an  image  adds  scale  as 
a  third  coordinate  (in  addition  to  the  two  image  coordinates).  Thus 
the  scale  space  is  a  3D  structure,  which  can  be  navigated  not  only 
along  the  x/y  positions  but  also  across  different  scale  levels. 

Continuous  Gaussian  scale  space 

The  scale-space  representation  of  an  image  at  a  particular  scale  level 
is  obtained  by  filtering  the  image  with  a  kernel  that  is  parameterized 
to  the  desired  scale.  Because  of  its  unique  properties  [11,71],  the 
most  common  type  of  scale  space  is  based  on  successive  filtering  with 
Gaussian  kernels.  Conceptually,  given  a  continuous,  2D  function 
F(x,y),  its  Gaussian  scale  space  representation  is  a  3D  function 

1-7 

See  Sec.  E.4  in  the  Appendix  for  additional  details. 

8  See  Table  25.1  for  a  summary  of  the  most  important  scale  space-related 
symbols  used  in  this  chapter.  615 
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G(x,y,a)  =  (F  *  HG’a)(x,y),  (25.15) 

where  HG,a  =  Ga(x,y)  is  a  2D  Gaussian  kernel  (see  Eqn.  (25.5)) 
with  unit  integral,  and  *  denotes  the  linear  convolution  over  x,  y. 
Note  that  a  >  0  serves  as  both  the  continuous  scale  parameter  and 
the  width  of  the  corresponding  Gaussian  filter  kernel. 

A  fully  continuous  Gaussian  scale  space  Q(x,y,cr)  covers  a  3D 
volume  and  represents  the  original  function  F(x,y)  at  varying  scales 
a.  For  <7  =  0,  the  Gaussian  kernel  i7G,°  has  zero  width,  which  makes 
it  equivalent  to  an  impulse  or  Dirac  function  5(x,y).9  This  is  the 
neutral  element  of  linear  convolution,  that  is, 

G(x,  y,  0 )  =  (F  *  Hg’°)( x,  y)  =  (F  *  8)(x,  y)  =  F(x,  y).  (25.16) 

Thus  the  base  level  G(x,  y ,  0)  of  the  Gaussian  scale  space  is  identical 
to  the  input  function  F(pc,y).  In  general  (with  a  >  0),  the  Gaussian 
kernel  HG,a  acts  as  a  low-pass  filter  with  a  cutoff  frequency  propor¬ 
tional  to  l/cr  (see  Sec.  E.3  in  the  Appendix),  the  maximum  frequency 
(or  bandwidth)  of  the  original  “signal”  F(x,  y)  being  potentially  un¬ 
limited. 

Discrete  Gaussian  scale  space 

This  is  different  for  a  discrete  input  function  7(r,u),  whose  band¬ 
width  is  implicitly  limited  to  half  the  sampling  frequency,  as  man¬ 
dated  by  the  sampling  theorem  to  avoid  aliasing.10  Thus,  in  the 
discrete  case,  the  lowest  level  G(x,y,  0)  of  the  Gaussian  scale  space 
is  not  accessible!  To  model  the  implicit  bandwidth  limitations  of  the 
sampling  process,  the  discrete  input  image  I(u,v)  is  assumed  to  be 
pre-filtered  (with  respect  to  the  underlying  continuous  signal)  with  a 
Gaussian  kernel  of  width  crs  >  0.5  [153],  that  is, 

G(u,v,crs)  =  I(u,v).  (25.17) 

Thus  the  discrete  input  image  I(u,v)  is  implicitly  placed  at  some 
initial  level  crs  of  the  Gaussian  scale  space,  and  the  lower  levels  with 
<7  <  <7S  are  not  available. 

Any  higher  level  crh  >  crs  of  the  Gaussian  scale  space  can  be 
derived  from  the  original  image  I(u,v)  by  filtering  with  Gaussian 
kernel  i7G,cr,  that  is, 

G{u,  v,  crh)  =  (/  *  i7G,cr)(R,  v),  with  a  =  ( 7^  —  cr|  .  (25.18) 

This  is  due  to  the  fact  that  applying  two  Gaussian  filters  of  widths 
<7!  and  <7 2 ,  one  after  the  other,  is  equivalent  to  a  single  convolution 
with  a  Gaussian  kernel  of  width  cr12,  that  is,11 

(I*HG^)  *  Hg,(T2  =  7*i7G’au25  (25.19) 

9  See  Chapter  5,  Sec.  5.3.4. 

10  See  Chapter  18,  Sec.  18.2.1. 

11  See  Sec.  E.l  in  the  Appendix  for  additional  details  on  combining  Gaus¬ 
sian  filters. 


with  a12  =  (<^1 +cr2)1^2-  We  define  the  discrete  Gaussian  scale  space  25.1  Interest  Points  at 
representation  of  an  image  I  as  a  vector  of  M  images,  one  for  each  Multiple  Scales 
scale  level  m: 


G  —  (G0,  G1? . . . ,  Gm_1).  (25.20) 

Associated  with  each  level  Gm  is  its  absolute  scale  crm  >  0,  and  each 
level  Gm  represents  a  blurred  version  of  the  original  image,  that  is, 
G m(u,v)  =  Q{u^v,am)  in  the  notation  introduced  in  Eqn.  (25.15). 
The  scale  ratio  between  adjacent  scale  levels, 

Aa  =  (25.21) 

is  pre-defined  and  constant.  Usually,  Aa  is  specified  such  that  the 
absolute  scale  <jm  doubles  with  a  given  number  of  levels  Q,  called  an 
octave.  In  this  case,  the  resulting  scale  increment  is  Aa  =  21^  with 
(typically)  Q  =  3, . . . ,  6. 

In  addition,  a  base  scale  a0  >  as  is  specified  for  the  initial  level 
G0,  with  as  denoting  the  smoothing  of  the  discrete  image  implied 
by  the  sampling  process,  as  discussed  already.  Based  on  empirical 
results,  a  base  scale  of  a0  =  1.6  is  recommended  in  [153]  to  achieve 
reliable  interest  point  detection.  Given  Q  and  the  base  scale  <r0,  the 
absolute  scale  at  an  arbitrary  scale  space  level  Gm  is 

<rm  =  (70  •  A™  =  (To  •  2™/y  (25.22) 

for  m  =  0, . . . ,  M  —  1. 

As  follows  from  Eqn.  (25.18),  each  scale  level  Gm  can  be  obtained 
directly  from  the  discrete  input  image  I  by  a  filter  operation 

Gm  =  I  *  HG’^ ,  (25.23) 

with  a  Gaussian  kernel  HG,<7rn  of  width 

=  \J<Tm-<r£  =  \! °o  ’  2 2m/Q  -  U •  (25.24) 

In  particular,  the  initial  scale  space  level  G0,  (with  the  specified  base 
scale  <r0)  is  obtained  from  the  discrete  input  image  I  by  linear  filtering 
using  a  Gaussian  kernel  of  width 

^0  =  \J°l-  °f-  (25.25) 

Alternatively,  using  the  relation  crm  =  crrn_1  •  Aa  (from  Eqn. 
(25.21)),  the  scale  levels  G1? . . . ,  GM-i  could  be  calculated  recursively 
from  the  base  level  G0  in  the  form 

Gm  =  Gm_1*H°’<,  (25.26) 

for  rri  >  0,  with  a  sequence  of  Gaussian  kernels  HG,(Trn  of  width 

<4  =  =  <T0  •  2m/«  •  yi  -  l/A f  .  (25.27) 


Table  25.2  lists  the  resulting  kernel  widths  for  Q  =  3  levels  per 
octave  and  base  scale  cr0  =  1.6  over  a  scale  range  of  6  octaves.  The 


617 


25  Scale-Invariant 
Feature  Transform 

(SIFT) 


value  arn  denotes  the  size  of  the  Gaussian  kernel  required  to  compute 
the  image  at  scale  m  from  the  discrete  input  image  I  (assumed  to  be 
sampled  with  <rs  =  0.5).  a'm  is  the  width  of  the  Gaussian  kernel  to 
compute  level  m  recursively  from  the  previous  level  m—  1.  Apparently 
(though  perhaps  unexpectedly),  the  kernel  size  required  for  recursive 
filtering  (afm)  grows  at  the  same  (exponential)  rate  as  the  absolute 
kernel  size  dm.12 


Table  25.2 

Filter  sizes  required  for  calcu¬ 
lating  Gaussian  scale  levels  Gm 
for  the  first  6  octaves.  Each 
octave  consists  of  Q  =  3  lev¬ 
els,  placed  at  increments  of 
Aa  along  the  scale  coordinate. 
The  discrete  input  image  I 
is  assumed  to  be  pre-filtered 
with  <7S .  Column  arn  denotes 
the  absolute  scale  at  level  m, 
starting  with  the  specified 
base  offset  scale  a0.  am  is  the 
width  of  the  Gaussian  filter 
required  to  calculate  level 
directly  from  the  input  image 
I.  Values  cr^  are  the  widths  of 
the  Gaussian  kernels  required 
to  calculate  level  Gm  from  the 
previous  level  Gm_1.  Note 
that  the  width  of  the  Gaussian 
kernels  needed  for  recursive 
filtering  (cr^)  grows  at  the 
same  exponential  rate  as  the 
size  of  the  direct  filter  (crm). 


At  scale  level  m  =  16  and  absolute  scale  cr16  =  1.6  •  216/3  ~  64.5, 
for  example,  the  Gaussian  filters  required  to  compute  G16  directly 
from  the  input  image  I  has  the  width  a16  =  (cr26  —  cr2 )1/2  =  (64.50802 
-0.52)1/2  64.5,  while  the  filter  to  blur  incrementally  from  the  pre¬ 

vious  scale  level  has  the  width  a[6  =  (<t26  —  crj^)1/2  =  (64.50802 
—  51.19762)1/2  ~  39.2.  Since  recursive  filtering  also  tends  to  accrue 
numerical  inaccuracies,  this  approach  does  not  offer  a  significant  ad¬ 
vantage  in  general.  Fortunately,  the  growth  of  the  Gaussian  kernels 
can  be  kept  small  by  spatially  sub-sampling  after  each  octave,  as  will 
be  described  in  Sec.  25.1.4. 

The  process  of  constructing  a  discrete  Gaussian  scale  space  using 
the  same  parameters  as  in  Table  25.2  is  illustrated  in  Fig.  25.6.  Again 
the  input  image  I  is  assumed  to  be  pre-filtered  at  crs  =  0.5  due  to 
sampling  and  the  absolute  scale  of  the  first  level  G0  is  set  to  a0  =  1.6. 
The  scale  ratio  between  successive  levels  is  fixed  at  Aa  —  21/3  ~ 
1.25992,  that  is,  each  octave  spans  three  discrete  scale  levels.  As 
shown  in  this  figure,  each  scale  level  Gm  can  be  calculated  either 
directly  from  the  input  image  I  by  filtering  with  a  Gaussian  of  width 
dm,  or  recursively  from  the  previous  level  by  filtering  with  a 'rn. 


m 

®  rn 

®  rn 

/ 

^rn 

18 

102.4000 

102.3988 

62.2908 

17 

81.2749 

81.2734 

49.4402 

16 

64.5080 

64.5060 

39.2408 

15 

51.2000 

51.1976 

31.1454 

14 

40.6375 

40.6344 

24.7201 

13 

32.2540 

32.2501 

19.6204 

12 

25.6000 

25.5951 

15.5727 

11 

20.3187 

20.3126 

12.3601 

10 

16.1270 

16.1192 

9.8102 

9 

12.8000 

12.7902 

7.7864 

8 

10.1594 

10.1471 

6.1800 

7 

8.0635 

8.0480 

4.9051 

6 

6.4000 

6.3804 

3.8932 

5 

5.0797 

5.0550 

3.0900 

4 

4.0317 

4.0006 

2.4525 

3 

3.2000 

3.1607 

1.9466 

2 

2.5398 

2.4901 

1.5450 

1 

2.0159 

1.9529 

1.2263 

0 

1.6000 

1.5199 

m  . . .  linear  scale  index 

crm  . . .  absolute  scale  at  level  m 
(Eqn.  (25.22)) 

am  . . .  relative  scale  at  level  m 
w.r.t.  the  original  image 
(Eqn.  (25.24)) 

cr^  . . .  relative  scale  at  level  m 
w.r.t.  the  previous  level 
m—  1  (Eqn.  (25.27)) 

as  =  0.5  (sampling  scale) 
cr0  =  1.6  (base  scale) 

Q  =  3  (levels  per  octave) 
A„  =  21/q  «  1.256 


12 


The  ratio  of  the  kernel  sizes  cr^/a'^  converges  to  yA  —  1  /A%.  (~  1.64 
for  Q  =  3)  and  is  thus  practically  constant  for  larger  values  of  m. 
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Fig.  25.6 

Gaussian  scale  space  construc¬ 
tion  (first  four  levels).  Pa¬ 
rameters  are  the  same  as  in 
Table  25.2.  The  discrete  in¬ 
put  image  I  is  assumed  to  be 
pre-filtered  with  a  Gaussian  of 
width  crs  =  0.5;  the  scale  of  the 
initial  level  (base  scale  offset) 
is  set  to  ctq  =  1.6.  The  discrete 
scale  space  levels  G0,  G1?  .  .  .  (at 
absolute  scales  cr0,  cr1,  .  .  .)  are 
slices  through  the  continuous 
scale  space.  Scale  levels  can 
either  be  calculated  by  filtering 
directly  from  the  discrete  im¬ 
age  I  with  Gaussian  kernels  of 
width  (T0,  a1,  .  .  .  (blue  arrows) 
or,  alternatively,  by  recursively 
filtering  with  cfi  ,  cr'2 ,  .  .  .  (green 
arrows). 


25.1.3  LoG/DoG  Scale  Space 

Interest  point  detection  in  the  SIFT  approach  is  based  on  finding  local 
maxima  in  the  output  of  LoG  filters  over  multiple  scales.  Analogous 
to  the  discrete  Gaussian  scale  space  described  in  Sec.  25.1.2,  a  LoG 
scale  space  representation  of  an  image  I  can  be  defined  as 

L  =  (L0,  Li, . . . ,  Lm_i),  (25.28) 

with  levels  L m  =  /  *  iLL,cr™,  where  iLL,cr™  (x,  y)  =  La  (x,y)  is  a 
scale- normalized  LoG  kernel  of  width  crm  (see  Eqn.  (25.10)). 

As  demonstrated  in  Eqn.  (25.12),  the  LoG  kernel  can  be  approx¬ 
imated  by  the  the  difference  of  two  Gaussians  whose  widths  differ  by 
a  certain  ratio  k.  Since  pairs  of  adjacent  scale  layers  in  the  Gaussian 
scale  space  are  also  separated  by  a  fixed  scale  ratio,  it  is  straightfor¬ 
ward  to  construct  a  multi-scale  DoG  representation, 

D  =  (Do?  Df, . . . ,  Dm_2)  (25.29) 

from  an  existing  Gaussian  scale  space  G  =  (G0,  Gx, . . . ,  Gm_i).  The 
individual  levels  in  the  DoG  scale  space  are  defined  as 

Dm  =  A  •  (Gm+1  -  Gm)  ~  Lm,  (25.30) 


for  m  =  0, . . . ,  M  —  2.  The  constant  factor  A  (defined  in  Eqn.  (25.14)) 
can  be  omitted  in  the  aforementioned  expression,  as  the  relative 
width  of  the  involved  Gaussians, 
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Fig.  25.7 

DoG  scale-space  construc¬ 
tion.  The  differences  of  suc¬ 
cessive  levels  G0,  G1?  .  .  .  of 
the  Gaussian  scale  space  (see 
Fig.  25.6)  are  used  to  approxi¬ 
mate  a  LoG  scale  space.  Each 
DoG-level  Dm  is  calculated 
as  the  point-wise  difference 
Gm_|_i  —  G rn  between  Gaus¬ 
sian  levels  Grnjrl  and  Gm.  The 
values  in  D0,  .  .  .  ,  D3  are  scale- 
normalized  (see  Eqn.  (25.14)) 
and  mapped  to  a  uniform 
intensity  range  for  viewing. 


^3 


^2 


°T 


3.2000 

t 


2.5398 

t 


2.0159 

t 


1.6000 


K  =  Aa=C^l  =2l/Q,  (25.31) 

is  simply  the  fixed  scale  ratio  Aa  between  successive  scale  space  levels. 
Note  that  the  DoG  approximation  does  not  require  any  additional 
normalization  to  approximate  a  scale-normalized  LoG  representation 
(see  Eqns.  25.10  and  25.14).  The  process  of  calculating  a  DoG  scale 
space  from  a  discrete  Gaussian  scale  space  is  illustrated  in  Fig.  25.7, 
using  the  same  parameters  as  in  Table  25.2  and  Fig.  25.6. 

25.1.4  Hierarchical  Scale  Space 

Despite  the  fact  that  2D  Gaussian  filter  kernels  are  separable  into  ID 
kernels,13  the  size  of  the  required  filter  grows  quickly  with  increasing 
scale,  regardless  if  a  direct  or  recursive  approach  is  used  (as  shown 
in  Table  25.2).  However,  each  Gaussian  Liter  operation  reduces  the 
bandwidth  of  the  signal  inversely  proportional  to  the  width  of  the 
kernel  (see  Sec.  E.3  in  the  Appendix).  If  the  image  size  is  kept  con¬ 
stant  over  all  scales,  the  images  become  increasingly  over  sampled  at 
higher  scale  levels.  In  other  words,  the  sampling  rate  in  a  Gaus¬ 
sian  scale  space  can  be  reduced  with  increasing  scale  without  losing 
relevant  signal  information. 
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13 


See  also  Chapter  5,  Sec.  5.3.3. 


Octaves  and  sub-sampling  (decimation)  25.1  Interest  Points  at 

In  particular,  doubling  the  scale  cuts  the  bandwidth  by  half,  that  is,  Multiple  Scales 

the  signal  at  scale  level  2 a  has  only  half  the  bandwidth  of  the  signal 

at  level  a.  An  image  signal  at  scale  level  2 a  of  a  Gaussian  scale  space 

thus  shows  only  half  the  bandwidth  of  the  same  image  at  scale  level 

a.  In  a  Gaussian  scale  space  representation  it  is  thus  safe  to  down- 

sample  the  image  to  half  the  sample  rate  after  each  octave  without 

any  loss  of  information.  This  suggests  a  very  efficient,  “pyramid-like” 

approach  for  constructing  a  DoG  scale  space,  as  illustrated  in  Fig. 

25.8. 14 

At  the  start  (bottom)  of  each  octave,  the  image  is  down-sampled 
to  half  the  resolution,  that  is,  each  pixel  in  the  new  octave  covers 
twice  the  distance  of  the  pixels  in  the  previous  octave  in  every  spa¬ 
tial  direction.  Within  each  octave,  the  same  small  Gaussian  kernels 
can  be  used  for  successive  filtering,  since  their  relative  widths  (with 
respect  to  the  original  sampling  lattice)  also  implicitly  double  at  each 
octave.  To  describe  these  relations  formally,  we  use 

G  =  (G0,G1,...,Gp_1)  (25.32) 

to  denote  a  hierarchical  Gaussian  scale  space  consisting  of  P  octaves. 

Each  octave 


(25.33) 


consists  of  Q-\-l  scale  levels  Gpq,  where  p  E  [0,  P  — 1]  is  the  octave 
index  and  q  E  [0,  Q\  is  the  level  index  within  the  containing  octave 
Gp.  With  respect  to  absolute  scale,  a  level  Gpq  =  Gp(q)  in  the 
hierarchical  Gaussian  scale  space  corresponds  to  the  level  Gm  in  the 
non-hierarchical  Gaussian  scale  space  (see  Eqn.  (25.20))  with  index 


m  =  Q  •  p  +  q. 


(25.34) 


As  follows  from  Eqn.  (25.22),  the  absolute  scale  at  level  Gpq  then  is 


a 


p,q 


<j 


m 


a0  ■  A 


m 

cr 


^0 


.  2 rn/Q 


=  <r0  •  2{-Qp+q)/Q  =  <t0  •  2p+q/Q, 


(25.35) 


where  cr0  =  cr0  0  denotes  the  predefined  base  scale  offset  (e.g.,  cr0  = 
1.6  in  Table  25.2).  In  particular,  the  absolute  scale  of  the  base  level 
Gp  o  °f  any  octave  Gp  is 


ap,o  =  °o  •  2P  (25.36) 

The  decimated  scale  &p  q  is  the  absolute  scale  ap  q  (Eqn.  (25.35)) 
expressed  in  the  coordinate  units  of  octave  Gp,  that  is, 

&p,q  =  A  =  ■  2~p  =  •  2P+q/Q  ■  2~p  =  a0  ■  29/Q.  (25.37) 

Note  that  the  decimated  scale  &p  q  is  independent  of  the  octave  index 
p  and  therefore  &p  q  =  &q,  for  any  level  index  q. 
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Successive  reduction  of  image  resolution  by  sub-sampling  is  the  core 
concept  of  “image  pyramid”  methods  [41]. 
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Fig.  25.8 

Hierarchical  Gaussian  scale 
space.  Each  octave  extends 
over  Q  =  3  scale  steps.  The 
base  level  Gp0  of  each  oc¬ 
tave  p  >  0  is  obtained  by 
2:1  sub-sampling  of  the  top 
level  G  i  3  of  the  next-lower 
octave.  At  the  transition  be¬ 
tween  octaves,  the  resolution 
(image  size)  is  cut  in  half  in 
the  x-  and  ^-direction.  The 
absolute  scale  at  octave  level 
Gp  g  is  crm,  with  m  —  Qp  +  q. 
Within  each  octave,  the  same 
set  of  Gaussian  kernels  (a1 , 
cr2,  d3)  is  used  to  calculate 
the  following  levels  from 
the  octave’s  base  level  Gp  0. 


From  the  octave’s  base  level  Gp  0,  the  subsequent  levels  in  the 
same  octave  can  be  calculated  by  filtering  with  relatively  small  Gaus¬ 
sian  kernels.  The  size  of  the  kernel  needed  to  calculate  scale-level  Gpq 
from  the  octave’s  base  level  Gp  0  is  obtained  from  the  corresponding 
decimated  scales  (Eqn.  (25.37))  as 

&p,q  =  \j &p,q  ~  &l,0  =  \j Vo  •  2?/Q)2  -  °0  =  '  V/22?/Q  -  1  , 

(25.38) 

for  q  >  0.  Note  that  aq  is  independent  of  the  octave  index  p  and 
thus  the  same  filter  kernels  can  be  used  at  each  octave.  For  example, 
with  Q  =  3  and  a0  =  1.6  (as  used  in  Table  25.2)  the  resulting  kernel 
widths  are 


cq  =  1.2263,  a2  =  1.9725,  d3  =  2.7713.  (25.39) 

Also  note  that,  instead  of  filtering  all  scale  levels  Gp  q  in  an  oc¬ 
tave  from  the  corresponding  base  level  Gp  0,  we  could  calculate  them 
recursively  from  the  next-lower  level  Gp  q_i.  While  this  approach 
requires  even  smaller  Gaussian  kernels  (and  is  thus  more  efficient), 
recursive  filtering  tends  to  accrue  numerical  inaccuracies.  Neverthe¬ 
less,  the  method  is  used  frequently  in  scale-space  implementations. 

Decimation  between  successive  octaves 

With  MxN  being  the  size  of  the  original  image  /,  every  sub-sampling 
step  between  octaves  cuts  the  size  of  the  image  by  half,  that  is, 


Mp+ 1  x  Ap+1 


Mp 

_  2  _ 
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(25.40) 


for  octaves  with  index  p  >  0.  The  resulting  image  size  at  octave  Gp  25.1  Interest  Points  at 
is  thus  Multiple  Scales 


Mp  x  Np 


o 


_  2  p  - 


x 


TVo 

2  p 


(25.41) 


The  base  level  Gp  0  of  each  octave  Gp  (with  p  >  0)  is  obtained  by 
sub-sampling  the  top  level  Gp_1q  of  the  next-lower  octave  Gp_1  as 


GPio  =  Decimate(Gp_1)Q),  (25.42) 


where  Decimate(G')  denotes  the  2:1  sub-sampling  operation,  that  is, 


Gp,o(^,  v)  ^  Gp_i?g(2u,  2u),  (25.43) 

for  each  sample  position  (rq  u)  G  [0,  Mp  —  1]  x  [0,  Np  —  1\.  Additional 
low-pass  filtering  is  not  required  prior  to  sub-sampling  since  the  Gaus¬ 
sian  smoothing  performed  in  each  octave  also  cuts  the  bandwidth  by 
half. 

The  main  steps  involved  in  constructing  a  hierarchical  Gaussian 
scale  space  are  summarized  in  Alg.  25.1.  In  summary,  the  input  im¬ 
age  I  is  first  blurred  to  scale  a0  by  filtering  with  a  Gaussian  kernel  of 
width  do.  Within  each  octave  Gp,  the  scale  levels  Gp  q  are  calculated 
from  the  base  level  Gp  0  by  filtering  with  a  set  of  Gaussian  filters  of 
width  aq  (q  =  1, . . . ,  Q).  Note  that  the  values  aq  and  the  correspond¬ 
ing  Gaussian  kernels  HG,C7(i  can  be  pre-calculated  once  since  they  are 
independent  of  the  octave  index  p  (Alg.  25.1,  lines  13-14).  The  base 
level  Gp  q  of  each  higher  octave  Gp  is  obtained  by  decimating  the  top 
level  Gp_i  q  of  the  previous  octave  Gp_1.  Typical  parameter  values 
are  as  =  0.5,  cr0  =  1.6,  Q  =  3,  P  =  4. 


Spatial  positions  in  the  hierarchical  scale  space 

To  properly  associate  the  spatial  positions  of  features  detected  in 
different  octaves  of  the  hierarchical  scale  space  we  define  the  function 

x0  <—  AbsPos (ccp,p), 

that  maps  the  continuous  position  xp  =  (xp,yp)  in  the  local  coordi¬ 
nate  system  of  octave  p  to  the  corresponding  position  x  =  (x,  y )  in 
the  coordinate  system  of  the  original  full-resolution  image  I  (octave 
p  =  0).  The  function  AbsPos  can  be  defined  recursively  by  relating 
the  positions  in  successive  octaves  as 


AbsPos  (xp,p) 


xp  for  p  =  0, 

AbsPos(2  -xp  ,p—  1)  for  p  >  0, 


(25.44) 


which  gives  x0  =  AbsPos(2p*xp,  0)  and  thus 

AbsPos  (xp,p)  =  2 p -xp.  (25.45) 

Hierarchical  LoG/DoG  scale  space 

Analogous  to  the  scheme  shown  in  Fig.  25.7,  a  hierarchical  DoG  scale 
space  representation  is  obtained  by  calculating  the  difference  of  adja¬ 
cent  scale  levels  within  each  octave  of  the  hierarchical  Gaussian  scale 
space,  that  is, 
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Alg.  25.1 

Building  a  hierarchical  Gaus¬ 
sian  scale  space.  The  input 
image  I  is  first  blurred  to 
scale  ctq  by  filtering  with  a 
Gaussian  kernel  of  width  a0 
(line  3).  In  each  octave  Gp, 
the  scale  levels  G„  „  are  cal- 
culated  from  the  base  level 
Gp,o  by  filtering  with  a  set 
of  Gaussian  filters  of  width 
ctq  (line  13-14).  The 
base  level  Gp0  of  each  higher 
octave  is  obtained  by  sub¬ 
sampling  the  top  level  Gp_1  q 
of  the  previous  octave  (line  6). 
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1:  BuildGaussianScaleSpace(7,  crs,  a0,  P,  Q) 

Input:  /,  source  image;  crs,  sampling  scale;  cr0,  reference  scale 
of  the  first  octave;  P,  number  of  octaves.  Q ,  number  of  scale 
steps  per  octave.  Returns  a  hierarchical  Gaussian  scale  space 
representation  G  of  the  image  /. 

2:  d0  V-  (cr0  —  as )  '  t>  scale  to  base  of  1st  octave,  Eq.  25.25 

3:  Ginit  V-  I  *  HG,a°  >  apply  2D  Gaussian  filter  of  width  a0 

4:  G0  V-  MakeGaussianOctave(Ginit,  0,  Q,  cr0)  >  create  octave  G0 

5:  for  p  <—  1, . . . ,  P  — 1  do  >  octave  index  p 

6:  Gnext  <—  Decimate(Gp_1)g)  >  dec.  top  level  of  octave  p—1 

7:  Gp  <—  MakeGaussianOctave(Gnext, p,  Q,  cr0)  >  create  octave 

Gp 

8:  G^— (G0,...,G  p-i) 

9:  return  G  >  hierarchical  Gaussian  scale  space  G 

10:  MakeGaussianOctave(Gbage ,  p,  Q,  <r0) 

Input:  Gbase,  octave  base  level;  p,  octave  index;  Q ,  number  of 
levels  per  octave;  cr0,  reference  scale. 


11 

^p,0  ^  ^base 

12 

for  q  V-  1, . . .  ,  Q  do 

>  level  index  q 

13 

cr0  •  \/2‘2q/Q  —  1 

>  see  Eq.  25.38 

14 

/'-'I  ~ 

Gp(?  fi-  Gbase  *  H  ,(Ti 3  t>  apply  2D  Gaussian  filter  of  width  a q 

15 

^ p  ^  (^p,Cb  •  •  •  5 

16 

return  Gp 

D>  scale  space  octave  Gp 

17 

Decimate(Gin) 

Input:  Gin,  Gaussian  scale  space 

level. 

18 

(M,N)<-  Size(Gin) 

19 

M'<-  LfJ,  N'<-  LfJ 

>  decimated  size 

20 

Create  map  Gout :  M'  xN  gR 

21 

for  all  (r,  v)  G  M'  x  N'  do 

22 

Gout  (u,v)  <-  Gin(2w,  2v) 

>2:1  subsampling 

23 

return  Gout 

>  decimated  scale  level  Gout 

Dp,q  =  Gp,q+ 1  -  Gp,q  (25.46) 

for  level  numbers  q  E  [0,  Q  —  1].  Figure  25.9  shows  the  corresponding 
Gaussian  and  DoG  scale  levels  for  the  previous  example  over  a  range 
of  three  octaves.  To  demonstrate  the  effects  of  sub-sampling,  the 
same  information  is  shown  in  Fig.  25.10  and  25.11,  with  all  level 
images  scaled  to  the  same  size.  Figure  25.11  also  shows  the  absolute 
values  of  the  DoG  response,  which  are  effectively  used  for  detecting 
interest  points  at  different  scale  levels.  Note  how  blob-like  features 
stand  out  and  disappear  again  as  the  scale  varies  from  fine  to  coarse. 
Analogous  results  obtained  from  a  different  image  are  shown  in  Figs. 
25.12  and  25.13. 

25.1.5  Scale  Space  Structure  in  SIFT 

In  the  SIFT  approach,  the  absolute  value  of  the  DoG  response  is  used 
to  localize  interest  points  at  different  scales.  For  this  purpose,  local 
maxima  are  detected  in  the  3D  space  spanned  by  the  spatial  x/y- 
positions  and  the  scale  coordinate.  To  determine  local  maxima  along 
the  scale  dimension  over  a  full  octave,  two  additional  DoG  levels, 


Gaussian  scale  space 


DoG  scale  space 


25.1  Interest  Points  at 
Multiple  Scales 


Octave  G2 
(100  x  75) 


Octave  Gx 
(200  x  150) 


Fig.  25.9 

Hierarchical  Gaussian  and 
DoG  scale  space  example,  with 
P  =  Q  =  3.  Gaussian  scale 
space  levels  Gpq  are  shown  in 
the  left  column,  DoG  levels 
Dp  in  the  right  column.  All 
images  are  shown  at  their  real 
scale. 


Dp  _1  and  Dp  q,  and  two  additional  Gaussian  scale  levels,  Gp  i  and 
Gp  Q+i,  are  required  in  each  octave. 

In  total,  each  octave  Gp  then  consists  of  Q+3  Gaussian  scale  levels 
Gpq  (q  =  —  1, . . . ,  Q  +  1)  and  Q  +  2  DoG  levels  Dp  q  (q  =  —  1, . . . ,  Q), 
as  shown  in  Fig.  25.14.  For  the  base  level  G0  1?  the  scale  index  is 
m  =  —  1  and  its  absolute  scale  (see  Eqns.  (25.22)  and  (25.35))  is 
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Fig.  25.10 

Hierarchical  Gaussian  scale 
space  example  (castle  im¬ 
age).  All  images  are  scaled 
to  the  same  size.  Note  that 
Gx  o  is  merely  a  sub-sampled 
copy  of  G0  3;  analogously,  G2  0 
is  sub-sampled  from  G13. 


^0,-1 


—  <Jq  • 


2 — i/ Q 


—  ao  ‘ 


(25.47) 


Thus,  with  the  usual  settings  (cr0  =  1.6  and  Q  =  3),  the  absolute 
scale  values  for  the  six  levels  of  the  first  octave  are 


=  1.2699,  a0  0  =  1.6000,  a01  =  2.0159, 

cr0  2  =  2.5398,  cr0)3  =  3.2000,  cr0)4  =  4.0317. 


(25.48) 


The  complete  set  of  scale  values  for  a  SIFT  scale  space  with  four 
octaves  (p  =  0, . . . ,  3)  is  listed  in  Table  25.3. 

To  construct  the  Gaussian  part  of  the  first  scale  space  octave  G0, 
the  initial  level  G0._i  is  obtained  by  filtering  the  input  image  I  with 
a  Gaussian  kernel  of  width 


ao,-i 


Vl.26992  -  0.52  «  1.1673 


(25.49) 


For  the  higher  octaves  (p  >  0),  the  initial  level  (q  =  —  1)  is  obtained 
by  sub-sampling  (decimating)  level  Q  —  1  of  the  next-lower  octave 
Gp_1,  that  is, 
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Gp,-i  Decimate(Gp_1?Q_1), 


(25.50) 


Octave  D0  Octave  Dx  Octave  D2 

(400  x  300)  (200  x  150)  (100  x  75) 


25.1  Interest  Points  at 
Multiple  Scales 


Fig.  25.11 

Hierarchical  DoG  scale  space 
example  (castle  image).  The 
three  top  rows  show  the  posi¬ 
tive  and  negative  DoG  values 
(zero  is  mapped  to  intermedi¬ 
ate  gray).  The  three  bottom 
rows  show  the  absolute  val¬ 
ues  of  the  DoG  results  (zero 
is  mapped  to  black,  maximum 
values  to  white).  All  images 
are  scaled  to  the  size  of  the 
original  image. 


analogous  to  Eqn.  (25.42).  The  remaining  levels  Gp  0>  •  •  •  ,  q+;l  of 
the  octave  are  either  calculated  by  incremental  filtering  (as  described 
in  Fig.  25.6)  or  by  filtering  from  the  octave’s  initial  level  Gp  _1  with 
a  Gaussian  of  width  apq  (see  Eqn.  (25.38)).  The  advantage  of  the 
direct  approach  is  that  numerical  errors  do  not  accrue  across  the 
scale  space;  the  disadvantage  is  that  the  kernels  are  up  to  50  %  larger 
than  those  needed  for  the  incremental  approach  (<t0  4  =  3.8265  vs. 
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Fig.  25.12 

Hierarchical  Gaussian  scale 
space  example  (stars  image). 


Octave  G0 
(400  X  300) 

Octave  Gx 
(200  x  150) 

Octave  G2 
(100  x  75) 

G0,3 

G1 ,3 

G2 ,3 

G0,2 

G 1 , 2 

G2 , 2 

G0,l 

G2 , 1 

G0,0 

Gl,0 

G2 ,0 

Table  25.3 

Absolute  and  relative  scale  val¬ 
ues  for  a  SIFT  scale  space  with 
four  octaves.  Each  octave  with 
index  p  =  0,  .  .  .  ,  3  consists  of 
6  Gaussian  scale  layers  Gp  , 
with  q  =  —  1,  .  .  .  ,  4.  For  each 
scale  layer,  m  is  the  scale  in¬ 
dex  and  cr  is  the  correspond¬ 
ing  absolute  scale.  Within 
each  octave  p.  crn  n  denotes  the 
relative  scale  with  respect  to 
the  octave’s  base  layer  Gp 
Each  base  layer  Gp  _1  is  ob¬ 
tained  by  sub-sampling  (deci¬ 
mating)  layer  q  =  Q  —  1  =  2 
in  the  previous  octave,  i.e., 
Gp,_i  =  Decimate(Gp_ljQ_1), 
for  p  >  0.  The  base  layer 
G0  i  in  the  bottom  octave  is 
derived  by  Gaussian  smooth¬ 
ing  of  the  original  image.  Note 
that  the  relative  scale  values 
cG  n  =  cr„  are  the  same  inside 
every  octave  (independent  of 
p)  and  thus  the  same  Gaus¬ 
sian  filter  kernels  can  be  used 
for  calculating  all  octaves. 


P 

q 

m 

d 

ap,q 

°q 

3 

4 

13 

8 

32.2540 

4.0317 

3.8265 

3 

3 

12 

8 

25.6000 

3.2000 

2.9372 

3 

2 

11 

8 

20.3187 

2.5398 

2.1996 

3 

1 

10 

8 

16.1270 

2.0159 

1.5656 

3 

0 

9 

8 

12.8000 

1.6000 

0.9733 

3 

-1 

8 

8 

10.1594 

1.2699 

0.0000 

2 

4 

10 

4 

16.1270 

4.0317 

3.8265 

2 

3 

9 

4 

12.8000 

3.2000 

2.9372 

2 

2 

8 

4 

10.1594 

2.5398 

2.1996 

2 

1 

7 

4 

8.0635 

2.0159 

1.5656 

2 

0 

6 

4 

6.4000 

1.6000 

0.9733 

2 

-1 

5 

4 

5.0797 

1.2699 

0.0000 

1 

4 

7 

2 

8.0635 

4.0317 

3.8265 

1 

3 

6 

2 

6.4000 

3.2000 

2.9372 

1 

2 

5 

2 

5.0797 

2.5398 

2.1996 

1 

1 

4 

2 

4.0317 

2.0159 

1.5656 

1 

0 

3 

2 

3.2000 

1.6000 

0.9733 

1 

-1 

2 

2 

2.5398 

1.2699 

0.0000 

0 

4 

4 

1 

4.0317 

4.0317 

3.8265 

0 

3 

3 

1 

3.2000 

3.2000 

2.9372 

0 

2 

2 

1 

2.5398 

2.5398 

2.1996 

0 

1 

1 

1 

2.0159 

2.0159 

1.5656 

0 

0 

0 

1 

1.6000 

1.6000 

0.9733 

0 

-1 

-1 

1 

1.2699 

1.2699 

0.0000 

p  .  .  .  octave  index 
q  .  .  .  level  index 

m  .  .  .  linear  scale  index  (m  =  Qp  +  q) 

d  .  .  .  decimation  factor  (d  =  2P) 

p  q  ...  absolute  scale  (Eqn.  (25.35)) 

&q  .  .  .  decimated  scale  (Eqn.  (25.37)) 

ct  .  .  .  relative  decimated  scale  w.r.t. 

octave’s  base  level  Gp  _1  (Eqn. 
(25.38)) 

P  —  3  (number  of  octaves) 

Q  =  3  (levels  per  octave) 

<70  —  1.6  (base  scale) 
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Octave  D0 
(400  X  300) 

Octave  Dx 
(200  X  150) 

Octave  D2 
(100  x  75) 

s.  K  A 

o 

n'C  -  • 

°0,2 

^1,2 

D2,2 

o  * 

r 

••  - 

D0,1 

D2,l 

*» 

•  - 

o 

o 

Q 

Din 

°2,0 

25.1  Interest  Points  at 
Multiple  Scales 


Fig.  25.13 

Hierarchical  DoG  scale  space 
example  (stars  image).  The 
three  top  rows  show  the  posi¬ 
tive  and  negative  DoG  values 
(zero  is  mapped  to  intermedi¬ 
ate  gray).  The  three  bottom 
rows  show  the  absolute  val¬ 
ues  of  the  DoG  results  (zero 
is  mapped  to  black,  maximum 
values  to  white).  All  images 
are  scaled  to  the  size  of  the 
original  image. 


0q  4  =  2.4525).  Note  that  the  inner  levels  Gp  q  of  all  higher  octaves 
(i.e.,  p  >  0,g  >  0)  are  calculated  from  the  base  level  Gp  1?  using 
the  same  set  of  kernels  as  for  the  first  octave,  as  listed  in  Table  25.3. 
The  complete  process  of  building  a  SIFT  scale  space  is  summarized 
in  Alg.  25.2. 
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Fig.  25.14 

Scale  space  structure  for  SIFT 
with  P  =  3  octaves  and  Q  =  3 
levels  per  octave.  To  per¬ 
form  local  maximum  detection 
(“max”)  over  the  full  octave, 
Q  +  2  DoG  scale  space  levels 
(Dp,- i,  •  •  •  ,  Dp,q)  are  required. 

The  blue  arrows  indicate  the 
decimation  steps  between  suc¬ 
cessive  Gaussian  octaves.  Since 
the  DoG  levels  are  obtained 
by  subtracting  pairs  of  Gaus¬ 
sian  scale  space  levels,  Q  +  3 
such  levels  (GP)_1;  .  .  .  ,  GP;Q+1) 
are  needed  in  each  octave  Gp. 
The  two  vertical  axes  on  the 
left  show  the  absolute  scale 
(cr)  and  the  discrete  scale  in¬ 
dex  (m),  respectively.  Note 
that  the  values  along  the  scale 
axis  are  logarithmic  with  con¬ 
stant  multiplicative  scale  in¬ 
crements  Aa  =  21//(T  The 
absolute  scale  of  the  input  im¬ 
age  (/)  is  assumed  as  as  =  0.5. 
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a 

A 

25.64  12 

11 
10 

12.8  T  9 
8 
7 
6 
5 
4 
3 
2 
1 
0 

1  -1 


m 


6.4  4 

3.2- 

*4 

1.6- 


0.84 
0.5 


Octave  0 


Input  image 


25.2  Key  Point  Selection  and  Refinement 

Key  points  are  identified  in  three  steps:  (1)  detection  of  extremal 
points  in  the  DOG  scale  space,  (2)  position  refinement  by  local  in¬ 
terpolation,  and  (3)  elimination  of  edge  responses.  These  steps  are 
detailed  in  the  following  and  summarized  in  Algs.  25.3-25.6. 

25.2.1  Local  Extrema  Detection 

In  the  first  step,  candidate  interest  points  are  detected  as  local  ex¬ 
trema  in  the  3D  DoG  scale  space  that  we  described  in  the  previous 
section.  Extrema  detection  is  performed  independently  within  each 
octave  p.  For  the  sake  of  convenience  we  define  the  3D  scale  space 
coordinate  c  =  (r,u,<7),  composed  of  the  spatial  position  (u,v)  and 
the  level  index  </,  as  well  as  the  function 

D(c)  :=DP:q+k(u,v)  (25.51) 

as  a  short  notation  for  selecting  DoG  values  from  a  given  octave  p. 
Also,  for  collecting  the  DoG  values  in  the  3D  neighborhood  around 
a  scale  space  position  c,  we  define  the  map 

Nc(z,  j,  k)  :=  D(c  +  i  •  +  j  •  ej  +  k  •  ek),  (25.52) 

with  i,  j,  k  G  {  — 1,0,1}  and  the  3D  unit  vectors 

ei  =  (1, 0, 0)T,  ej  =  (0, 1, 0)T,  ek  =  (0, 0, 1)T.  (25.53) 

The  neighborhood  Nc  includes  the  center  value  D(c)  and  the  26  val¬ 
ues  of  its  immediate  neighbors  (see  Fig.  25.15(a)).  These  values  are 
used  to  estimate  the  3D  gradient  vector  and  the  Hessian  matrix  for 
the  3D  scale  space  position  c,  as  will  be  described. 

A  DoG  scale  space  position  c  is  accepted  as  a  local  extremum 
(minimum  or  maximum)  if  the  associated  value  D(c)  =  Nc(0,0,0) 


2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 


BuildSiftScaleSpace(7,  crs,  a0,P,Q) 

Input:  /,  source  image;  crs,  sampling  scale;  cr0,  reference  scale  of 
the  first  octave;  P,  number  of  octaves;  Q,  number  of  scale  steps 
per  octave.  Returns  a  SIFT  scale  space  representation  (G,  D)  of 
the  image  I . 

cr init  cr o  •  2-1/^  t>  abs.  scale  at  level  (0,  —1),  Eq.  25.47 

cfinit  <—  J crpnit  —  a2  >  relative  scale  w.r.t.  crs,  Eq.  25.49 

Ginit  <—  I  *  H  ,a init  >  2D  Gaussian  filter  with  <rinit 

G0  <—  MakeGaussianOctave(Ginit,  0,  Q,  cr0)  >  Gauss,  octave  0 
for  p  <—  1 , . . . ,  P  —  1  do  t>  for  octaves  1 , . . . ,  P  —  1 

Gnext  <—  Decimate(Gp_1g_1)  >  see  Alg.  25.1 

G p  <—  MakeGaussianOctave(Gnext, p,  Q,  cr0)  >  octave  p 

G  <—  (G0, . . . ,  G p-i)  >  assemble  the  Gaussian  scale  space  G 

for  p  <—  0, . . . ,  P  —  1  do 

Dp  MakeDogOctave(Gp,p,  Q) 

D  (D0, . . . ,  DP_1)  >  assemble  the  DoG  scale  space  D 

return  (G,  D) 


14 

MakeGaussianOctave(Gbase,p,  Q,  <r0) 

Input:  Gbase,  Gaussian  base  level;  p,  octave  index;  Q,  scale  steps 

per  octave,  cr0,  reference  scale. 

Returns  a  new  Gaussian  octave 

G p  with  Q  +  3  levels  levels. 

15 

^p,  — 1  ^  ^base 

>  level  q  =  —  1 

16 

for  q  <—  0, . . . ,  Q  +  l  do 

>  levels  q  =  —  1 , . . . ,  Q  +  1 

17 

<7q  <7q  •  -\/22q/Q  ~ 

>  rel.  scale  w.r.t  base  level  Gbase 

18 

Gp,g  Gbase  *  PG,<T(3 

>  2D  Gaussian  filter  with  aq 

19 

^ P  ^  (^p,  —  1  5  •  •  •  5  ^p,Q+l  ) 

20 

return  Gp 

21 

MakeDogOctave(Gp,p,  Q) 

Input:  Gp,  Gaussian  octave;  p, 

octave  index;  Q ,  scale  steps  per 

octave.  Returns  a  new  DoG  octave  Dp  with  Q  +  2  levels. 

22 

for  q  < - 1, . . . ,  Q  do 

23 

^ p,q  ^*p,q+ 1  —  ® p,q 

>  diff.  of  Gaussians,  Eq.  25.30 

24 

^p  ^  (^p,  —  li  Dp>0,  •  •  •  5  Dp,g) 

>  levels  q  =  —  1 , . . . ,  Q 

25 

return  Dp 

25.2  Key  Point 
Selection  and 
Refinement 

Alg.  25.2 

Building  a  SIFT  scale  space. 
This  procedure  is  an  extension 
of  Alg.  25.1  and  takes  the 
same  parameters.  The  SIFT 
scale  space  (see  Fig.  25.14) 
consists  of  two  components: 
a  hierarchical  Gaussian  scale 
space  G  =  (G0,  .  .  .  ,  GP_1) 
with  P  octaves  and  a  (derived) 
hierarchical  DoG  scale  space 
D  =  (D0I...,DP_1).  Each 
Gaussian  octave  Gp  holds  Q  +  3 
levels  .  .  .  >  Gp ;  q-j-  i )  • 

At  each  Gaussian  octave,  the 
lowest  level  G„  _n  is  obtained 
by  decimating  level  Q  —  1  of  the 
previous  octave  Gp_1  (line  T). 
Every  DoG  octave  Dp  contains 
Q  +  2  levels  (DP)_l5  .  .  .  ,  Dp  Q). 
A  DoG  level  _  is  calculated 

y  5  h 

as  the  pointwise  difference  of 
two  adjacent  Gaussian  levels 

0^®  23). 

Typical  parameter  settings  are 
as  =  0.5,  a0  =  1.6,  Q  —  3, 

P  =  4. 


is  either  negative  and  also  smaller  or  positive  and  greater  than  all 
neighboring  values.  In  addition,  a  minimum  difference  textrm  >  0 
can  be  specified,  indicating  how  much  the  center  value  must  at  least 
deviate  from  the  surrounding  values.  The  decision  whether  a  given 
neighborhood  Nc  contains  a  local  minimum  or  maximum  can  thus  be 
expressed  as 


lsLocalMin(Nc)  :=  Nc(0,  0,  0)  <  0  A 

Nc(0,0,0)  +  textrm  <  min  N c(i,j,k),  (25.54) 

(bj'A)A 

(0,0,0) 

lsLocalMax(Nc)  :=  Nc(0,0,0)>0  A 

Nc(0, 0, 0)  -  textrm  <  max  Nc(i ,  j,  k)  (25.55) 

(bj'A)A 

(0,0,0) 
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Fig.  25.15 

Different  3D  neighborhoods 
for  detecting  local  extrema 
in  the  DoG  scale  space.  The 
red  cube  represents  the  DoG 
value  at  the  reference  coor¬ 
dinate  c  =  (u,  v ,  q)  at  the 
spatial  position  (u,  v )  at  scale 
level  q  (within  some  octave 
p ).  Full  3x3x3  neighbor¬ 
hood  with  26  elements  (a); 
other  types  of  neighborhoods 
with  18  (b)  or  10  (c)  elements, 
respectively,  are  also  com¬ 
monly  used.  A  local  maxi¬ 
mum/minimum  is  detected  if 
the  DoG  value  at  the  center  is 
greater/smaller  than  all  neigh¬ 
boring  values  (green  cubes). 


'I+E 


(a)  26-neighborhood 


i 

U+I 

(b)  18-neighborhood 


i 

(c)  10-neighborhood 


(see  procedure  lsExtremum(Nc)  in  Alg.  25.5).  As  illustrated  in  Fig. 
25.15(b-c),  alternative  3D  neighborhoods  with  18  or  10  cells  may  be 
specified  for  extrema  detection. 

25.2.2  Position  Refinement 

Once  a  local  extremum  is  detected  in  the  DoG  scale  space,  only  its 
discrete  3D  coordinates  c  =  (u,  v,  q )  are  known,  consisting  of  the 
spatial  grid  position  (u,  v)  and  the  index  (q)  of  the  associated  scale 
level.  In  the  second  step,  a  more  accurate,  continuous  position  for 
each  candidate  key  point  is  estimated  by  fitting  a  quadratic  function 
to  the  local  neighborhood,  as  proposed  in  [37].  This  is  particularly 
important  at  the  higher  octaves  of  the  scale  space,  where  the  spatial 
resolution  becomes  increasingly  coarse  due  to  successive  decimation. 
Position  refinement  is  based  on  a  local  second-order  Taylor  expansion 
of  the  discrete  DoG  function,  which  yields  a  continuous  approxima¬ 
tion  function  whose  maximum  or  minimum  can  be  found  analytically. 
Additional  details  and  illustrative  examples  are  provided  in  Sec.  C.3.2 
of  the  Appendix. 

At  any  extremal  position  c  =  (u,v,q)  in  octave  p  of  the  hierarchi¬ 
cal  DoG  scale  space  D,  the  corresponding  3x3x3  neighborhood  J\fD(c) 
is  used  to  estimate  the  elements  of  the  continuous  3D  gradient,  that 


is 


VD(c) 


d 

d 

d. 


X 


y 


i  / -Gc  +  ej)  -  D(c  —  e;) 
-•  D(c  +  ej)  -  D(c  —  ej) 
^  \D(C  +  ek)  -  D(c  -  ek) 


(25.56) 


with  D()  as  defined  in  Eqn.  (25.51).  Similarly,  the  3x3  Hessian 
matrix  for  position  c  is  obtained  as 


(d  d  d 

^ xx  Ujxy  ^xa 
^xy  dyy  dya 

d  d  d 

^ xu  ^ ycr  ^ crcr 

with  the  required  second  order  derivatives  estimated  as 

dxx  =  D(c  —  e i)  —  2  •  D(c)  +  D(c+ej), 
dyy  =  _D(c  —  ej)  —  2  •  -D(c)  +  P(c+e j), 
daa  =  D(c-ek)  -  2  •  D(c)  +  D(c+ek), 

t  _  D(c+ei+ej)-D(c-ei+ej)-D(c+ei-ej)  +  D(c-ei-ej) 
U'xy  4  5 

7  _  D(c+ei+ek)-D(c-ei+ek)-D(c+ei-ek)  +  D(c-ei-ek) 

^xcr  4 

t  D(c+ej+ek)-D(c-ej+ek)-D(c+ej-ek)  +  D(c-ej-ek) 

dya  4 


(25.57) 


(25.58) 
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Dc(x )  =  D(c )  +  V^(c)-(x  — c)  +  |(x  — c)T-HD(c)-(x  — c),  (25.59) 

for  the  continuous  position  x  =  (t,?/,<t)t.  The  scalar- valued  func¬ 
tion  Dc(x)  G  R,  with  c  =  (rq?;,g)T  and  x  =  is  a  local, 

continuous  approximation  of  the  discrete  DoG  function  D p  q(u,v)  at 
octave  p,  scale  level  g,  and  spatial  position  u ,  This  is  a  quadratic 
function  with  an  extremum  (maximum  or  minimum)  at  position 


See  the  procedures  Gradient(Nc)  and  Hessian(Nc)  in  Alg.  25.5  (p.  651) 
for  additional  details.  From  the  gradient  vector  Vo(c)  and  the  Hes¬ 
sian  matrix  HD(c),  the  second  order  Taylor  expansion  around  point 
c  is 


x  = 


c  +  d  =  c  -  H D1(c)  •  Vd(c) 

v - V - ' 

d—x  —  c 


(25.60) 


with  d  =  (x' ,y' ,  cr' )T  =  x  —  c,  under  the  assumption  that  the  inverse 
of  the  Hessian  matrix  J1D  exists.  By  inserting  the  extremal  position 
x  into  Eqn.  (25.59),  the  peak  (minimum  or  maximum)  value  of  the 
continuous  approximation  function  D  is  found  as15 


^peak(c)  =  DC(X)  =  D(C )  +  \  ■  Vd(c)  •  (x  —  c) 

=  D(c)  +  \  ■  Vl(c)  ■  d, 


(25.61) 


where  d  =  x  —  c  (cf.  Eqn.  (25.60))  denotes  the  3D  vector  between 
the  neighborhood’s  discrete  center  position  c  and  the  continuous  ex¬ 
tremal  position  x. 

A  scale  space  location  c  is  only  retained  as  a  candidate  interest 
point  if  the  estimated  magnitude  of  the  DoG  exceeds  a  given  thresh¬ 
old  tpeak,  that  is,  if 


-^peak(^)  ^  fpeak' 


(25.62) 


If  the  distance  d—  (x/,p/,cr/)T  from  c  to  the  estimated  (continu¬ 
ous)  peak  position  x  in  Eqn.  (25.60)  is  greater  than  a  predefined  limit 
(typically  0.5)  in  any  spatial  direction,  the  center  point  c  =  (rq  v,  q)T 
is  moved  to  one  of  the  neighboring  DoG  cells  by  maximally  ±1  unit 
steps  along  the  u ,  v  axes,  that  is, 


c  c  + 


min(l,  max(— 1,  round(T/))) 
min(l,  max(— 1,  round(g/))) 

0 


(25.63) 


The  q  component  of  c  is  not  modified  in  this  version,  that  is,  the 
search  continues  at  the  original  scale  level.16  Based  on  the  surround¬ 
ing  3D  neighborhood  of  this  new  point,  a  Taylor  expansion  (Eqn. 
(25.60))  is  again  performed  to  estimate  a  new  peak  location.  This 
is  repeated  until  either  the  peak  location  is  inside  the  current  DoG 
cell  or  the  allowed  number  of  repositioning  steps  nrefine  is  reached 

15  See  Eqn.  (C.64)  in  Sec.  C.3.3  in  the  Appendix  for  details. 

16  This  is  handled  differently  in  other  SIFT  implementations. 
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(typically  nrefine  is  set  to  4  or  5).  If  successful,  the  result  of  this  step 
is  a  candidate  feature  point 

c  =  (x,  y,  q)J  =  c  +  (x',  y\  0)T.  (25.64) 


Notice  that  (in  this  implementation)  the  scale  level  q  remains  un¬ 
changed  even  if  the  3D  Taylor  expansion  indicates  that  the  estimated 
peak  is  located  at  another  scale  level.  See  procedure  RefineKeyPosition() 
in  Alg.  25.4  (p.  650)  for  a  concise  summary  of  these  steps. 

It  should  be  mentioned  that  the  original  publication  [153]  is  not 
particularly  explicit  about  the  aforementioned  position  refinement 
process  and  thus  slightly  different  approaches  are  used  in  various 
open-source  SIFT  implementations.  For  example,  the  implementa¬ 
tion  in  VLFeat 17  [241]  moves  to  one  of  the  direct  neighbors  at  the 
same  scale  level  as  described  earlier,  as  long  as  \x'\  or  \y'\  is  greater 
than  0.6.  AutoPano-SIFT 18  by  S.  Nowozin  calculates  the  length  of 
the  spatial  displacement  d  =  \\(x'>y')\\  and  discards  the  current  point 
if  d  >  2.  Otherwise  it  moves  by  Au  =  round(x/),  Av  =  round(?/) 
without  limiting  the  displacement  to  ±1.  The  Open-Source  SIFT 
Library 19  [106]  used  in  OpenCV  also  makes  full  moves  in  the  spatial 
directions  and,  in  addition,  potentially  also  changes  the  scale  level  by 
Aq  =  round(cr/)  in  each  iteration. 


25.2.3  Suppressing  Responses  to  Edge-Like  Structures 

In  the  previous  step,  candidate  interest  points  were  selected  as  those 
locations  in  the  DoG  scale  space  where  the  Taylor  approximation  had 
a  local  maximum  and  the  extrapolated  DoG  value  was  above  a  given 
threshold  (tpeak).  However,  the  DoG  filter  also  responds  strongly 
to  edge-like  structures.  At  such  positions,  interest  points  cannot  be 
located  with  sufficient  stability  and  repeatability.  To  eliminate  the 
responses  near  edges,  Lowe  suggests  the  use  of  the  principal  curva¬ 
tures  of  the  2D  DoG  result  along  the  spatial  x,  y  axes,  using  the  fact 
that  the  principal  curvatures  of  a  function  are  proportional  to  the 
eigenvalues  of  the  function’s  Hessian  matrix  at  a  given  point. 

For  a  particular  lattice  point  c  =  (r,  x,  q)  in  DoG  scale  space, 
with  neighborhood  ND  (see  Eqn.  (25.52)),  the  2x2  Hessian  matrix 
for  the  spatial  coordinates  is 


d 

d 


(25.65) 


with  dxx,  dxy ,  dyy  as  defined  in  Eqn.  (25.58),  that  is,  these  values 
can  be  extracted  from  the  corresponding  3x3  Hessian  matrix  Hp(c) 
(see  Eqn.  (25.57)). 

The  matrix  H  (c)  has  two  eigenvalues  A1?  A2,  which  we  define  as 
being  ordered,  such  that  X1  has  the  greater  magnitude  (|AX|  >  |A2|). 
If  both  eigenvalues  for  a  point  c  are  of  similar  magnitude,  the  function 
exhibits  a  high  curvature  along  two  orthogonal  directions  and  in  this 


17  http://www.vlfeat.org/overview/sift.html. 

18  http://sourceforge.net/projects/hugin/files/autopano-sift-C/. 

19  http://robwhess.github.io/opensift/. 


case  c  is  likely  to  be  a  good  reference  point  that  can  be  located 
reliably.  In  the  optimal  situation  (e.g.,  near  a  corner),  the  ratio  of 
the  eigenvalues  p  =  Ax/A2  is  close  to  1.  Alternatively,  if  the  ratio 
p  is  high  it  can  be  concluded  that  a  single  orientation  dominates  at 
this  position,  as  is  typically  the  case  in  the  neighborhood  of  edges. 

To  estimate  the  ratio  p  it  is  not  necessary  to  calculate  the  eigen¬ 
values  themselves.  Following  the  description  in  [153],  the  sum  and 
product  of  the  eigenvalues  Ax,  A2  can  be  found  as 
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Fig.  25.16 

Limiting  the  ratio  of  princi¬ 
pal  curvatures  (edge  ratio) 

Pi, 2  by  specifying  omax.  The 
quantity  a  (blue  line)  has  a 
minimum  when  the  eigenvalue 

Ai  • 

ratio  p12  —  x  is  one,  that 

is,  when  the  two  eigenvalues 
Ax,  A2  are  equal,  indicating  a 
corner-like  event.  Typically 
only  one  of  the  eigenvalues  is 
dominant  in  the  vicinity  of  im¬ 
age  lines,  such  that  px  2  and 
a  values  are  significantly  in¬ 
creased.  In  this  example,  the 
principal  curvature  ratio  p12 
is  limited  to  pmax  =  5.0  by 
setting  amax  =  (5  +  l)2/5  =  7.2 
(red  line). 


Af  T  A2  trace (HCC2/(c))  dxx  T  d yy^  (25.66) 

Ai  *  A2  det(Ha,^(c))  dxx  •  dyy  dXy.  (25. 6T) 


If  the  determinant  det(Hxy)  is  negative ,  the  principal  curvatures  of 
the  underlying  2D  function  have  opposite  signs  and  thus  point  c  can 
be  discarded  as  not  being  an  extremum.  Otherwise,  if  the  signs  of 
both  eigenvalues  X1 ,  A2  are  the  same ,  then  the  ratio 

Pi, 2  =  h  (25.68) 

A2 


is  positive  (with  X1  =  p12  •  A2),  and  thus  the  expession 
[trace(H  (c))]2  (Ax  +  A2)2 


a  = 


det(H^(c)) 
(Pl,2  '  V  +  V)2 


Ai  •  A2 

Al  •  (Pi,2  +  i)2 


P  1,2 


Ai 


Pi, 2 


Ai 


(Pl,2  +  1)' 
Pi, 2 


(25.69) 

(25.70) 


depends  only  on  the  ratio  p12-  If  the  determinant  of  Hxy  is  positive, 
the  quantity  a  has  a  minimum  (4.0)  at  p1  2  =  1,  if  the  two  eigenvalues 
are  equal  (see  Fig.  25.16).  Note  that  the  ratio  a  is  the  same  for 
Pi, 2  =  Ai/A2  or  Pi, 2  =  A2 Md  since 


(Pi,2  +  l)2 

a  =  — — - 

Pi, 2 


l 

Pl ,  2 


(25.71) 


To  verify  that  the  eigenvalue  ratio  p12  at  a  given  position  c  is 
below  a  specified  limit  pmax  (making  c  a  good  candidate),  it  is  thus 
sufficient  to  check  the  condition 


a  <  a 


—  ^max? 


with 


a  max 


(Prnax  T  1) 


(25.72) 


Pm  ax 
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Rejection  of  edge-like  features 
by  controlling  the  max.  cur¬ 
vature  ratio  pmax.  The  size 
of  the  circles  is  proportional 
to  the  scale  level  at  which 
the  corresponding  key  point 
was  detected,  the  color  in¬ 
dicating  the  containing  oc¬ 
tave  (0  =  red,  1  =  green, 
2  =  blue,  3  =  magenta). 


without  the  need  to  actually  calculate  the  individual  eigenvalues  X1 
and  A2.20  Pmax  should  be  greater  than  1  and  is  typically  chosen  to 
be  in  the  range  3, . . . ,  10  (pmax  =  10  is  suggested  in  [153]).  The 
resulting  value  of  amax  in  Eqn.  (25.72)  is  constant  and  needs  only  be 
calculated  once  (see  Alg.  25.3,  line  2).  Detection  examples  for  varying 
values  of  pmax  are  shown  in  Fig.  25.17.  Note  that  considerably  more 
candidates  appear  near  edges  as  pmax  is  raised  from  3  to  40. 


25.3  Creating  Local  Descriptors 

For  each  local  maximum  detected  in  the  hierarchical  DoG  scale  space, 
a  candidate  key  point  is  created,  which  is  subsequently  refined  to 
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A  similar  trick  is  used  in  the  Harris  corner  detection  algorithm  (see 
Chapter  7). 


a  continuous  position  following  the  steps  we  have  just  described  25.3  Creating  Local 
(see  Eqns.  (25.56)-(25.64)).  Then,  for  each  refined  key  point  k!  —  Descriptors 
(p,  g,x,p),  one  or  more  (up  to  four)  local  descriptors  are  calculated. 

Multiple  (up  to  four)  descriptors  may  be  created  for  a  position  if  the 
local  orientation  is  not  unique.  This  process  involves  the  following 
steps: 

1.  Find  the  dominant  orientation (s)  of  the  key  point  k'  from  the 
distribution  of  the  gradients  at  the  corresponding  Gaussian  scale 
space  level. 

2.  For  each  dominant  orientation,  create  a  separate  SIFT  descriptor 
at  the  key  point  k! . 


25.3.1  Finding  Dominant  Orientations 

Local  orientation  from  Gaussian  scale  space 

Orientation  vectors  are  obtained  by  sampling  the  gradient  values  of 
the  hierarchical  Gaussian  scale  space  G p  q(u,v)  (see  Eqn.  (25.32)). 
For  any  lattice  position  (r,  v)  at  octave  p  and  scale  level  g,  the  local 
gradient  is  calculated  as 


^  p,q  (^b  ^) 


^p,q(^dl,  V) 

^  +  1) 


(T  1 1  v) 

^p,q  (^b  ^  1) 


(25.73) 


From  these  gradient  vectors,  the  gradient  magnitude  and  orientation 
(i.e.,  polar  coordinates)  are  found  as21 


Ep,q{U-)  V) 


4*p,q  (^b  ^0 


^p,g(^b^)||  \J T  dy  , 
Z\7p^q(u,v)  =  tan  _1(dy/dx). 


(25.74) 

(25.75) 


These  scalar  fields  Epq  and  c t>p  q  are  typically  pre-calculated  for  all 
relevant  octaves/levels  p,  q  of  the  Gaussian  scale  space  G. 


Orientation  histograms 

To  find  the  dominant  orientations  for  a  given  key  point,  a  histogram 
of  the  orientation  angles  is  calculated  for  the  gradient  vectors  col¬ 
lected  from  a  square  window  around  the  key  point  center.  Typically 
the  histogram  has  norient  =  36  bins,  that  is,  the  angular  resolution 
is  10°.  The  orientation  histogram  is  collected  from  a  square  region 
using  an  isotropic  Gaussian  weighting  function  whose  width  aw  is 
proportional  to  the  decimated  scale  &q  (see  Eqn.  (25.37))  of  the  key 
point’s  scale  level  q.  Typically  a  Gaussian  weighting  function  “with 
a  a  that  is  1.5  times  that  of  the  scale  of  the  key  point”  [153]  is  used, 
that  is, 


aw  =  1.5  -&q  =  1.5 -cto  •  2q/Q.  (25.76) 

Note  that  crw  is  independent  of  the  octave  index  p  and  thus  the 
same  weighting  functions  are  used  in  each  octave.  To  calculate  the 
orientation  histogram ,  the  Gaussian  gradients  around  the  given  key 
point  are  collected  from  a  square  region  of  size  2 rw  x  2 rw,  with 


21 


See  also  Chapter  16,  Sec.  16.1. 
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(25.77) 


25  Scale-Invariant  rw  —  [2-5  •  crw 

Feature  Transform 

(SIFT)  amply  dimensioned  to  avoid  numerical  truncation  effects.  For  the 

parameters  listed  in  Table  25.3  (cr0  =  1.6,  Q  =  3),  the  values  for  crw 

(expressed  in  the  octave’s  coordinate  units)  are 


<7 

0 

1 

2 

3 

aw 

1.6000 

4 

2.0159 

5 

2.5398 

6 

3.2000  (25.78) 

7 

In  Alg.  25.7,  <rw  and  rw  of  the  Gaussian  weighting  function  are  cal¬ 
culated  in  lines  7  and  8,  respectively.  At  each  lattice  point  (r,u), 
the  gradient  vector  Vpq(u,v)  is  calculated  in  octave  p  and  level  q 
of  the  Gaussian  scale  space  G  (Alg.  25.7,  line  16).  From  this,  the 
gradient  magnitude  Ep  q(u ,  v)  and  orientation  (j)p  q{u ,  v)  are  obtained 
(lines  29-30).  The  corresponding  Gaussian  weight  is  calculated  (in 
line  18)  from  the  spatial  distance  between  the  grid  point  (r,  v)  and 
the  interest  point  (x,  y)  as 

wG(u,  v )  =  exp  (_  (u-x)  +(v-y)  \  (25.79) 

For  the  grid  point  (r,u),  the  quantity  to  be  accumulated  into  the 
orientation  histogram  is 

z  =  Ep  q(u,v)  •  wG(u,v),  (25.80) 

that  is,  the  local  gradient  magnitude  weighted  by  the  Gaussian  win¬ 
dow  function  (Alg.  25.7,  line  19). 

The  orientation  histogram  h^,  consists  of  norient  bins  and  thus  the 
continuous  bin  number  for  the  angle  cj)(u,  v)  is 

(25.81) 

(see  Alg.  25.7,  line  20).  To  collect  the  continuous  orientations  into  a 
histogram  with  discrete  bins,  quantization  must  be  performed.  The 
simplest  approach  is  to  select  the  “nearest”  bin  (by  rounding)  and  to 
add  the  associated  quantity  (denoted  z)  entirely  to  the  selected  bin. 
Alternatively,  to  reduce  quantization  effects,  a  common  technique  is 
to  split  the  quantity  z  onto  the  two  closest  bins.  Given  the  continuous 
bin  value  the  indexes  of  the  two  closest  discrete  bins  are 


k>a  = 


-^orient 

2tt 


(b(u.v) 


k0  =  |kJ  m°d  n 


orient 


and  ki  =  (|_ft<J  +1)  mod  norient,  (25.82) 


respectively.  The  quantity  z  (Eqn.  (25.80))  is  then  partitioned  and 
accumulated  into  the  neighboring  bins  k0,k1  of  the  orientation  his¬ 
togram  in  the  form 


h(/)(& o)  h^(/c 0)  +  (1  —  a)  •  z, 

h</>(^ i)  <—  ^(Aq)  +  &  -  z. 


(25.83) 


with  a  =  —  L^J  •  This  process  is  illustrated  by  the  example  in 

Fig.  25.18  (see  also  Alg.  25.7,  lines  21-25). 
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25.3  Creating  Local 
Descriptors 


Fig.  25.18 

Accumulating  into  multiple 
histogram  bins  by  linear  in¬ 
terpolation.  Assume  that 
some  quantity  z  (blue  bar) 
is  to  be  added  to  the  discrete 
histogram  at  the  contin¬ 
uous  position  The  his¬ 

togram  bins  adjacent  to  are 

k0  =  L«yJ  and  k±  =  p^J  + 1- 

The  fraction  of  2;  accumulated 
into  bin  kx  is  z1  —  z  •  a,  (red 
bar),  with  ot  —  —  k0.  Anal¬ 

ogously,  the  quantity  added  to 
bin  k0  is  z0  =  z  •  (1  —  a)  (green 
bar). 


Fig.  25.19 

Orientation  histogram  exam¬ 
ple.  Each  of  the  36  radial  bars 
corresponds  to  one  entry  in 
the  orientation  histogram  h^. 
The  length  (radius)  of  each 
radial  bar  with  index  k  is  pro¬ 
portional  to  the  accumulated 
value  in  the  corresponding  bin 
h ^(k)  and  its  orientation  is  <f>k. 


Fig.  25.20 

Smoothing  the  orientation 
histogram  (from  Fig.  25.19)  by 
repeatedly  applying  a  circular 
low-pass  filter  with  the  ID 
kernel  H  =  j  •  (1,2,  1). 


Orientation  histogram  smoothing 

Figure  25.19  shows  a  geometric  rendering  of  the  orientation  histogram 
that  explains  the  relevance  of  the  cell  indexes  (discrete  angles  <pk)  and 
the  accumulated  quantities  (z).  Before  calculating  the  dominant  ori¬ 
entations,  the  raw  orientation  histogram  is  usually  smoothed  by 
applying  a  (circular)  low-pass  filter,  typically  a  simple  3-tap  Gaus¬ 
sian  or  box-type  filter  (see  procedure  SmoothCircular()  in  Alg.  25.7, 
lines  6-16). 22  Stronger  smoothing  is  achieved  by  applying  the  filter 
multiple  times,  as  illustrated  in  Fig.  25.20.  In  practice,  two  to  three 
smoothing  iterations  appear  to  be  sufficient. 


Locating  and  interpolating  orientation  peaks 

After  smoothing  the  orientation  histogram,  the  next  step  is  to  detect 
the  peak  entries  in  h^,.  A  bin  k  is  considered  a  significant  orientation 
peak  if  h ^(fc)  is  a  local  maximum  and  its  value  is  not  less  than  a 
certain  fraction  of  the  maximum  histogram  entry,  that  is,  only  if 


22 


Histogram  smoothing  is  not  mentioned  in  the  original  SIFT  publication 
[153]  but  used  in  most  implementations. 
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25  Scale-Invariant 
Feature  Transform 

(SIFT) 

with  tdomor  =  0.8  as  a  typical  limit. 

To  achieve  a  finer  angular  resolution  than  provided  by  the  orien¬ 
tation  histogram  bins  (typically  spaced  at  10°  steps)  alone,  a  con¬ 
tinuous  peak  orientation  is  calculated  by  quadratic  interpolation  of 
the  neighboring  histogram  values.  Given  a  discrete  peak  index  fc, 
the  interpolated  (continuous)  peak  position  k  is  obtained  by  fitting  a 
quadratic  function  to  the  three  successive  histogram  values  h^fc— 1), 
M&),  U(fc+ 1)  as23 


h >  h0((fc  -  1)  mod  norient )  A 
h 4,{k)  >  h0((fc  +  1)  mod  norient)  A  (25.84) 

^0(0  A  t^omor  •  max  h^j(z) , 


l  k  N(fc-l)  ~  h0(fc+l) 

2  •  [  h^k  —  1)  —  2  h^fc)  +  h^fc+l) 


(25.85) 


with  all  indexes  taken  modulo  norient.  From  Eqn.  (25.81),  the  (con¬ 
tinuous)  dominant  orientation  angle  6  G  [0,  27 r)  is  then  obtained  as 


0  =  {k  mod  norient)  • 


27 T 

•) 

-^orient 


(25.86) 


mit  6  G  [0,27 r).  In  this  way,  the  dominant  orientation  can  be  esti¬ 
mated  with  accuracy  much  beyond  the  coarse  resolution  of  the  orien¬ 
tation  histogram.  Note  that,  in  some  cases,  multiple  histogram  peaks 
are  obtained  for  a  given  key  point  (see  procedure  FindPeakOrientations() 
in  Alg.  25.6,  lines  18-31).  In  this  event,  individual  SIFT  descriptors 
are  created  for  each  dominant  orientation  at  the  same  key  point  po¬ 
sition  (see  Alg.  25.3,  line  8). 

Figure  25.21  shows  the  orientation  histograms  for  a  set  of  detected 
key  points  in  two  different  images  after  applying  a  varying  number 
of  smoothing  steps.  It  also  shows  the  interpolated  dominant  orienta¬ 
tions  6  calculated  from  the  orientation  histograms  (Eqn.  (25.86))  by 
the  corresponding  vectors. 


25.3.2  SIFT  Descriptor  Construction 

For  each  key  point  k!  =  (p,  <7,  x,  y)  and  each  dominant  orientation  0,  a 
corresponding  SIFT  descriptor  is  obtained  by  sampling  the  surround¬ 
ing  gradients  at  octave  p  and  level  q  of  the  Gaussian  scale  space  G. 

Descriptor  geometry 

The  geometry  underlying  the  calculation  of  SIFT  descriptors  is  illus¬ 
trated  in  Fig.  25.22.  The  descriptor  combines  the  gradient  orienta¬ 
tion  and  magnitude  from  a  square  region  of  size  wd  x  red,  which  is 
centered  at  the  (continuous)  position  (x,  y)  of  the  associated  feature 
point  and  aligned  with  its  dominant  orientation  6.  The  side  length  of 
the  descriptor  is  set  to  wd  =  10  •  frq,  where  &q  denotes  the  key  point’s 
decimated  scale  (radius  of  the  inner  circle).  It  depends  on  the  key 
point’s  scale  level  q  (see  Table  25.4). 
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See  Sec.  C.1.2  in  the  Appendix  for  details. 


25.3  Creating  Local 
Descriptors 


Fig.  25.21 

Orientation  histograms  and 
dominant  orientations  (exam¬ 
ples).  n  =  0,  .  .  .  ,  3  smoothing 
iterations  were  applied  to  the 
orientation  histograms.  The 
(interpolated)  dominant  ori¬ 
entations  are  shown  as  radial 
lines  that  emanate  from  each 
feature’s  center  point.  The 
size  of  the  histogram  graphs 
is  proportional  to  the  absolute 
scale  (cr  ,  see  Table  25.3)  at 
which  the  corresponding  key 
point  was  detected.  The  col¬ 
ors  indicate  the  index  of  the 
containing  scale  space  octave  p 
(red  =  0,  green  =  1,  blue  =  2, 
magenta  =  3). 


The  region  is  partitioned  into  nspat  x  nspat  sub-squares  of  iden¬ 
tical  size;  typically  nspat  =  4  (see  Table  25.5).  The  contribution  of 
each  gradient  sample  is  attenuated  by  a  circular  Gaussian  function  of 
width  <rd  =  0.25  •  wd  (blue  circle).  The  weights  drop  off  radially  and 
are  practically  zero  at  rd  =  2.5  •  crd  (green  circle  in  Fig.  25.22).  Thus 
only  samples  outside  this  zone  need  to  be  included  for  calculating  the 
descriptor  statistics. 


25  Scale-Invariant 
Feature  Transform 

(SIFT) 

Fig.  25.22 

Geometry  of  a  SIFT  descrip¬ 
tor.  The  descriptor  is  calcu¬ 
lated  from  a  square  support  re¬ 
gion  that  is  centered  at  the  key 
point’s  position  (x,y),  aligned 
to  the  key  point’s  dominant 
orientation  6 ,  and  partitioned 
into  nspat  X  nspat  (4  X  4)  sub¬ 
squares.  The  radius  of  the  in¬ 
ner  (gray)  circle  corresponds  to 
the  feature  point’s  decimated 
scale  value  ( &  ).  The  blue  cir¬ 
cle  displays  the  width  (crd)  of 
the  Gaussian  weighting  func¬ 
tion  applied  to  the  gradients; 

its  value  is  practically  zero 
outside  the  green  circle  (rd). 
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To  achieve  rotation  invariance,  the  descriptor  region  is  aligned  to 
the  key  point’s  dominant  orientation,  as  determined  in  the  previous 
steps.  To  make  the  descriptor  invariant  to  scale  changes,  its  size  wd 
(expressed  in  the  grid  coordinate  units  of  octave  p)  is  set  proportional 
to  the  key  point’s  decimated  scale  &q  (see  Eqn.  (25.37)),  that  is, 

wd  =  sd-&q=sd-a0- 2«/«,  (25.87) 

where  sd  is  a  constant  size  factor.  For  sd  =  10  (see  Table  25.5),  the 
descriptor  size  wd  ranges  from  16.0  (at  level  0)  to  25.4  (at  level  2),  as 
listed  in  Table  25.4.  Note  that  the  descriptor  size  wd  only  depends 
on  the  scale  level  index  q  and  is  independent  of  the  octave  index  p. 
Thus  the  same  descriptor  geometry  applies  to  all  octaves  of  the  scale 
space. 


Table  25.4 

SIFT  descriptor  dimensions 
for  different  scale  levels  q  (for 
size  factor  sd  =  10  and  Q  —  3 
levels  per  octave).  &  is  the 
key  point’s  decimated  scale, 
wd  is  the  descriptor  size,  crd 
is  the  width  of  the  Gaussian 
weighting  function,  and  rd  is 
the  radius  of  the  descriptor’s 
support  region.  For  Q  =  3, 
only  scale  levels  q  =  0,  1,  2 
are  relevant.  All  lengths  are 
expressed  in  the  octave’s  (i.e., 
decimated)  coordinate  units. 
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Q 

Wd  —  Sd  •  &q 

<rd  —  0.25  •  wd 

rd  =  2.5  •  crd 

3 

3.2000 

32.000 

8.0000 

20.0000 

2 

2.5398 

25.398 

6.3495 

15.8738 

1 

2.0159 

20.159 

5.0398 

12.5994 

0 

1.6000 

16.000 

4.0000 

10.0000 

-1 

1.2699 

12.699 

3.1748 

7.9369 

The  descriptor’s  spatial  resolution  is  specified  by  the  parameter 
nspat.  Typically  nspat  =  4  (as  shown  in  Fig.  25.22)  and  thus  the 
total  number  of  spatial  bins  is  nspat  x  nspat  =  16  (in  this  case).  Each 
spatial  descriptor  bin  relates  to  an  area  of  size  (R;d/nspat)  x  (ii;d/nspat). 
For  example,  at  scale  level  q  =  0  of  any  octave,  &0  =  1.6  and  the 
corresponding  descriptor  size  is  wd  =  sd  •  <r0  =  10  •  1.6  =  16.0  (see 
Table  25.4).  In  this  case  (illustrated  in  Fig.  25.23),  the  descriptor 
covers  16  x  16  gradient  samples,  as  suggested  in  [153].  Figure  25.24 
shows  an  example  with  M-shaped  feature  point  markers  aligned  to 
the  dominant  orientation  and  scaled  to  the  descriptor  region  width 
wd  of  the  associated  scale  level. 
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25.3  Creating  Local 
Descriptors 


Fig.  25.23 

Geometry  of  the  SIFT  descrip¬ 
tor  in  relation  to  the  discrete 
sample  grid  of  the  associated 
octave  (level  q  =  0,  parameter 
sd  =  10).  In  this  case,  the  dec¬ 
imated  scale  is  cr0  =  1.6  and 
the  width  of  the  descriptor  is 
wd  =  sd  •  a0  =  10  •  1.6  =  16.0. 


Fig.  25.24 

Marked  key  points  aligned  to 
their  dominant  orientation. 
Note  that  multiple  feature 
instances  are  inserted  at  key 
point  positions  with  more  than 
one  dominant  orientation.  The 
size  of  the  markers  is  propor¬ 
tional  to  the  absolute  scale 
(cr  ,  see  Table  25.3)  at  which 
the  corresponding  key  point 
was  detected.  The  colors  in¬ 
dicate  the  index  of  the  scale 
space  containing  octave  p  (red 
=  0,  green  =  1,  blue  =  2,  ma¬ 
genta  =  3). 


Gradient  features 

The  actual  SIFT  descriptor  is  a  feature  vector  obtained  by  histogram- 
ming  the  gradient  orientations  of  the  Gaussian  scale  level  within  the 
descriptors  spatial  support  region.  This  requires  a  3D  histogram 
hv(z,j,  fc),  with  two  spatial  dimensions  (i,j)  for  the  nspat  x  nspat 
sub-regions  and  one  additional  dimension  (fc)  for  nangl  gradient  ori¬ 
entations.  This  histogram  thus  contains  nspat  x  nspat  x  nangl  bins. 

Figure  25.25  illustrates  this  structure  for  the  typical  setup,  with 
nspat  =  4  and  nangi  =  8  (see  Table  25.5).  In  this  arrangement,  eight 
orientation  bins  k  =  0, . . . ,  7  are  attached  to  each  of  the  16  spatial 
position  bins  (A1,...,ZM),  which  makes  a  total  of  128  histogram 
bins. 

For  a  given  key  point  k!  =  (p,  <7,£,p),  the  histogram  hv  accu¬ 
mulates  the  orientations  (angles)  of  the  gradients  at  the  Gaussian 
scale  space  level  Gp  q  within  the  support  region  around  the  (conti- 
nous)  center  coordinate  (x,y).  At  each  grid  point  (u,v)  inside  this 
region,  the  gradient  vector  Vq  is  estimated  (as  described  in  Eqn. 
(25.73)),  from  which  the  gradient  magnitude  E(u,v)  and  orientation 
4>(u,v)  are  calculated  (see  Eqns.  (25.74)  (25.75)  and  lines  27-31  in 
Alg.  25.7).  For  efficiency  reasons,  E(u,v)  and  cj)(u,v)  are  typically 
pre-calculated  for  all  relevant  scale  levels. 

Each  gradient  sample  contributes  to  the  gradient  histogram  hv  a 
particular  quantity  z  that  depends  on  the  gradient  magnitude  E  and 
the  distance  of  the  sample  point  (r,  v)  from  the  key  point’s  center 
(x,p).  Again  a  Gaussian  weighting  function  (of  width  crd)  is  used  to 
attenuate  samples  with  increasing  spatial  distance;  thus  the  resulting 
accumulated  quantity  is 
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Fig.  25.25 

SIFT  descriptor  structure  for 
nsPat  =  4  and  nangl  =  8.  Eight 

orientation  bins  k  =  0,  .  .  .  ,  7  (a) 

are  provided  for  each  of  the  16 
spatial  bins  ij  =  Al,  .  .  .  ,  04. 

Thus  the  gradient  histogram 
hv  holds  128  cells  that  are 
arranged  to  a  ID  feature  vec¬ 
tor  ( A 1 0 ,  A 1 2  •  •  •  ,  046,  D47) 
as  shown  in  (b). 


(b) 


A  B  C  D 


t  T  T  T 

< 

»  4 

_ t _ 

► 

k  =  \  0  l  23  4  5  6  7| 


Al  A2  A3  A4  B 1  B 2  B 3  04  Cl  C 2  C3  C4  D1  02  03  04 


z(r,  v )  =  i?(iq  v)  •  idG  =  /£(?/,  v)  •  exp  (— ).  (25.88) 

The  width  <rd  of  the  Gaussian  function  wG()  is  proportional  to  the 
side  length  of  the  descriptor  region,  with 


ad  =  0.25  •  wd  =  0.25  •  sd  •  &q.  (25.89) 


The  weighting  function  drops  off  radially  from  the  center  and  is  prac¬ 
tically  zero  at  distance  rd  =  2.5  *<7d.  Therefore,  only  gradient  samples 
that  are  closer  to  the  key  point’s  center  than  rd  (green  circle  in  Fig. 
25.22)  need  to  be  considered  in  the  gradient  histogram  calculation 
(see  Alg.  25.8,  lines  7  and  17).  For  a  given  key  point  k!  =  (p,  g,  x,  p), 
sampling  of  the  Gaussian  gradients  can  thus  be  confined  to  the  grid 
points  (ig  v)  inside  the  square  region  bounded  by  x  ±  rd  and  y  =b  rd 
(see  Alg.  25.8,  lines  8-10  and  15-16).  Each  sample  point  (u,v)  is 
then  subjected  to  the  affine  transformation 


u'\  _  1  /cos (—0)  —  sin(— 6)\  fu  —  x\ 
v'J  wd  \sin(-0)  cos(-0)y  \v-y)  ’ 


(25.90) 


which  performs  a  rotation  by  the  dominant  orientation  0  and  maps 
the  original  (rotated)  square  of  size  wd  x  wd  to  the  unit  square  with 
coordinates  u',v'  E  [—0.5, +0.5]  (see  Fig.  25.23). 

To  make  feature  vectors  rotation  invariant,  the  individual  gradient 
orientations  </>(r,  v)  are  rotated  by  the  dominant  orientation,  that  is, 


$ (u,v)  =  ((j)(u,v)  —  9)  mod  27t,  (25.91) 
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with  0'(tx,  v)  G  [0,  2tt) ,  such  that  the  relative  orientation  is  preserved. 


For  each  gradient  sample,  with  the  continuous  coordinates  (V,  A,  25.3  Creating  Local 
</>'),  the  corresponding  quantity  z(u,v)  (Eqn.  (25.88))  is  accumulated  Descriptors 
into  the  3D  gradient  histogram  hv.  For  a  complete  description  of  this 
step  see  procedure  UpdateGradientHistogramQ  in  Alg.  25.9.  It  first 
maps  the  coordinates  (r/,p/,0/)  (see  Eqn.  (25.90))  to  the  continuous 
histogram  position  by 

^  ^spat  ^  "F  0 . 5  (ft-spat  1 )  •> 

j  -^spat  *  ^  ~F  0.5  •  (nspat  1),  (25.92) 

=  ,  _^angl 

r  27 r  ’ 

such  that  i',j'  E  [— 0.5,  nspat  —  0.5]  and  k!  E  [0,nangl). 

Analogous  to  inserting  into  a  continuous  position  of  a  ID  his¬ 
togram  by  linear  interpolation  over  two  bins  (see  Fig.  25.18),  the 
quantity  z  is  distributed  over  eight  neighboring  histogram  bins  by 
tri-linear  interpolation.  The  quantiles  of  z  contributing  to  the  in¬ 
dividual  histogram  bins  are  determined  by  the  distances  of  the  co¬ 
ordinates  (i'jj'jfc')  from  the  discrete  indexes  (i,j,k)  of  the  affected 
histogram  bins.  The  indexes  (i,j,k)  are  found  as  the  set  of  possible 
combinations  {i0Ci}  x  {jo ?  Ji }  x  {&ch^i}>  with 


*o  —  L^J >  H  ~  (do  +  l)? 

Jo  =  U'\ ,  3i  =  (do  +  1),  (25.93) 

k0  =  [ k’\  mod  nangl,  kl  =  {kQ  +  1)  mod  nangl, 

and  the  corresponding  quantiles  (weights)  are 


(Tq  —  J  +  1  —  ^  —  ^1  —  ^ 

A  =  L/J  + 1  -  /  =  j\  -  j 

7o  =  [k'\  +  1  ~  k' , 


Qq  —  1  —  CY,  q  , 

Pi  =  l-Po,  (25.94) 
7i  =  1  -  7o, 


and  the  (eight)  affected  bins  of  the  gradient  histogram  are  finally 
updated  as 

5  v  (^o  5  do  i  ^o)  z  ‘  ao  ‘  A)  ‘  7o? 

hv(^ijio?  ^o)  ^  ‘  ai  *  A)  ‘  7o? 

hv(^0:di5^o)  z  •  a0  •  Pi  •  7o,  (25.95) 

F5  2  •  oq  •  •  7X. 

Attention  must  be  paid  to  the  fact  that  the  coordinate  fc  represents 
an  orientation  and  must  therefore  be  treated  in  a  circular  manner, 
as  illustrated  in  Fig.  25.26  (also  see  Alg.  25.9,  lines  11-12). 

For  each  histogram  bin,  the  range  of  contributing  gradient  sam¬ 
ples  covers  half  of  each  neighboring  bin,  that  is,  the  support  regions 
of  neighboring  bins  overlap,  as  illustrated  in  Fig.  25.27. 


Normalizing  SIFT  descriptors 

The  elements  of  the  gradient  histogram  hv  are  the  raw  material  for 
the  SIFT  feature  vectors  /sift.  The  process  of  calculating  the  fea¬ 
ture  vectors  from  the  gradient  histogram  is  described  in  Alg.  25.10. 
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Fig.  25.26 

3D  structure  of  the  gradient 
histogram,  with  nspat  X  nspat  = 
4x4  bins  for  the  spatial  di¬ 
mensions  ( i,j )  and  nangl  =  8 
bins  along  the  orientation  axis 
(k).  For  the  histogram  to  accu¬ 
mulate  a  quantity  2  into  some 
continuous  position  (i' ,  j' ,  k'), 
eight  adjacent  bins  receive 
different  quantiles  of  z  that 
are  determined  by  tri-linear 
interpolation  (a).  Note  that 
the  bins  along  the  orientation 
axis  (f)  are  treated  circularly; 
for  example,  bins  at  k  =  0 
are  also  considered  adjacent 
to  the  bins  at  k  =  7  (b). 


Fig.  25.27 

Overlapping  support  regions 
in  the  gradient  field.  Due  to 
the  tri-linear  interpolation 
used  in  the  histogram  cal¬ 
culation,  the  spatial  regions 
associated  with  the  cells  of 
the  orientation  histogram  hv 
overlap.  The  shading  of  the 
circles  indicates  the  weight 
wG  assigned  to  each  sample 
by  the  Gaussian  weighting 
function,  whose  value  de¬ 
pends  on  the  distance  of  each 
sample  from  the  key  point’s 
center  (see  Eqn.  (25.88)). 


Initially,  the  3D  gradient  histogram  hv  (which  contains  continuous 
values)  of  size  nspat  x  nspat  x  nangl  is  flattened  to  a  ID  vector  f  of 
length  ns2pat  •  nangi  (typ.  128),  with 

/((*'  ■*Tpat  "h  j)  nangi  +  &)  hv(i,j,  fc),  (25.96) 

for  i,  j  =  0, . . . ,  nspat  —  1  and  k  =  0, . . . ,  nangl  —  1.  The  elements  in  f 
are  thus  arranged  in  the  same  order  as  shown  in  Fig.  25.25,  with  the 
orientation  index  k  being  the  fastest  moving  and  the  spatial  index  i 
being  the  slowest  (see  Alg.  25.10,  lines  3-8). 24 

Changes  in  image  contrast  have  a  linear  impact  upon  the  gradient 
magnitude  and  thus  also  upon  the  values  of  the  feature  vector  /.  To 
eliminate  these  effects,  the  vector  f  is  subsequently  normalized  to 

f(m)  •/(>),  (25.97) 

for  all  m,  such  that  f  has  unit  norm  (see  Alg.  25.10,  line  9).  Since  the 
gradient  is  calculated  from  local  pixel  differences,  changes  in  absolute 
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24 


Note  that  different  ordering  schemes  for  arranging  the  elements  of  the 
feature  vector  are  used  in  various  SIFT  implementations.  For  successful 
matching,  the  ordering  of  the  elements  must  be  identical,  of  course. 


25.4  SIFT  Algorithm 
Summary 


/(m)  <-  min(/(m),  tfclip),  (25.98) 

with  typically  tfclip  =  0.2,  as  suggested  in  [153]  (see  Alg.  25.10,  line 
10).  After  this  step,  f  is  normalized  once  again,  as  in  Eqn.  (25.97). 
Finally,  the  real- valued  feature  vector  f  is  converted  to  an  integer 
vector  by 

TiftM  min  (round  (sfscale  •  /(to)),  255)),  (25.99) 

with  sfscale  being  a  predefined  constant  (typ.  sfscale  =  512).  The 
elements  of  /sift  are  in  the  range  [0,  255]  to  be  conveniently  encoded 
and  stored  as  a  byte  sequence  (see  Alg.  25.10,  line  12). 

The  final  SIFT  descriptor  for  a  given  key  point  k!  =  (p,  g,  x,  y)  is 
a  tuple 

s  =  (x\y\a,0Jsitt),  (25.100) 

which  contains  the  key  point’s  interpolated  position  x',  y'  (in  original 
image  coordinates),  the  absolute  scale  cr,  its  dominant  orientation 
0,  and  the  corresponding  integer- valued  gradient  feature  vector  /sift 
(see  Alg.  25.8,  line  27).  Remember  that  multiple  SIFT  descriptors 
may  be  produced  for  different  dominant  orientations  located  at  the 
same  key  point  position.  These  will  have  the  same  position  and  scale 
values  but  different  6  and  /sift  data. 


brightness  do  not  affect  the  gradient  magnitude,  unless  saturation 
occurs.  Such  nonlinear  illumination  changes  tend  to  produce  peak 
gradient  values,  which  are  compensated  for  by  clipping  the  values  of 
/  to  a  predefined  maximum  tfclip,  that  is, 


25.4  SIFT  Algorithm  Summary 

This  section  contains  a  collection  of  algorithms  that  summarizes  the 
SIFT  feature  extraction  process  described  in  the  previous  sections  of 
this  chapter. 

Algorithm  25.3  shows  the  top-level  procedure  GetSiftFeatures(J), 
which  returns  a  sequence  of  SIFT  feature  descriptors  for  the  given 
image  /.  The  remaining  parts  of  Alg.  25.3  describe  the  key  point 
detection  as  extrema  of  the  DOG  scale  space.  The  refinement  of 
key  point  positions  is  covered  in  Alg.  25.4.  Algorithm  25.5  contains 
the  procedures  used  for  neighborhood  operations,  detecting  local  ex¬ 
trema,  and  the  calculation  of  the  gradient  and  Hessian  matrix  in  3D. 
Algorithm  25.6  covers  the  operations  related  to  finding  the  dominant 
orientations  at  a  given  key  point  location,  based  on  the  orientation 
histogram  that  is  calculated  in  Alg.  25.7.  The  final  formation  of  the 
SIFT  descriptors  is  described  in  Alg.  25.8,  which  is  based  on  the  pro¬ 
cedures  defined  in  Algs.  25.9  and  25.10.  The  global  constants  used 
throughout  these  algorithms  are  listed  in  Table  25.5,  together  with 
the  corresponding  Java  identifiers  in  the  associated  source  code  (see 
Sec.  25.7). 
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Table  25.5 

Predefined  constants 
used  in  the  SIFT  algo¬ 
rithms  (Algs.  25.3—25.11). 


Scale  space  parameters 


Symbol 

Java  id. 

Value 

Description 

Q 

Q 

3 

scale  steps  (levels)  per  octave 

p 

P 

4 

number  of  scale  space  octaves 

sigma_s 

0.5 

sampling  scale  (nominal  smoothing  of  the  input  image) 

^0 

sigma_0 

1.6 

base  scale  of  level  0  (base  smoothing) 

Key-point  detection 


Symbol 

Java  id. 

Value 

Description 

n 

orient 

n_0rient 

36 

number  of  orientation  bins  (angular  resolution)  used  for 
calculating  the  dominant  key  point  orientation 

^refine 

n_Ref ine 

5 

max.  number  of  iterations  for  repositioning  a  key  point 

^smooth 

n_Smooth 

2 

number  of  smoothing  iterations  applied  to  the  orientation 
histogram 

P  max 

rho_Max 

10.0 

max.  ratio  of  principal  curvatures  (3,  .  .  .  ,  10) 

^domor 

t_DomOr 

0.8 

min.  value  in  orientation  histogram  for  selecting  dominant 
orientations  (rel.  to  max.  entry) 

kxtrm 

t_Extrm 

0.0 

min.  difference  w.r.t.  any  neighbor  for  extrema  detection 

hag 

t_Mag 

0.01 

min.  DoG  magnitude  for  initial  key  point  candidates 

^peak 

t_Peak 

0.01 

min.  DoG  magnitude  at  interpolated  peaks 

Feature  descriptor 


Symbol 

Java  id. 

Value 

Description 

IT 

±Aspat 

n_Spat 

4 

number  of  spatial  descriptor  bins  along  each  x/y  axis 

^angl 

n_Angl 

16 

number  of  angular  descriptor  bins 

sd 

s_Desc 

10.0 

spatial  size  factor  of  descriptor  (relative  to  feature  scale) 

^fscale 

s_Fscale 

512.0 

scale  factor  for  converting  normalized  feature  values  to 
byte  values  in  [0,  255] 

hclip 

t_Fclip 

0.2 

max.  value  for  clipping  elements  of  normalized  feature 
vectors 

Feature  matching 


Symbol 

Java  id. 

Value 

Description 

P  max 

rho_ax 

0.8 

max.  ratio  of  best  and  second-best  matching  feature  dis¬ 
tance 
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25.5  Matching  SIFT  Features 

Most  applications  of  SIFT  features  aim  at  locating  corresponding 
interest  points  in  two  or  more  images  of  the  same  scene,  for  example, 
for  matching  stereo  pairs,  panorama  stitching,  or  feature  tracking. 
Other  applications  like  self-localization  or  object  recognition  might 
use  a  large  database  of  model  descriptors  and  the  task  is  to  match 
these  to  the  SIFT  features  detected  in  a  new  image  or  video  sequence. 
All  these  applications  require  possibly  large  numbers  of  pairs  of  SIFT 
features  to  be  compared  reliably  and  efficiently. 


25.5.1  Feature  Distance  and  Match  Quality 

In  a  typical  situation,  two  sequences  of  SIFT  features  S ^  and 
are  extracted  independently  from  a  pair  of  input  images  /a,  Ib ,  that 
is, 


S(a)  =  {s[a\ 


,0) 


.0) 

’AT 


)  and  =  (s^\ 


8W) 

’  *Nu) 


The  goal  is  to  find  matching  descriptors  in  the  two  feature  sets.  The 
similarity  between  a  given  pair  of  descriptors,  Si  =  (xi,yi,<Ti,8i,fi) 
and  Sj  =  (xj,  y3,  6h,  /  ■),  is  measured  by  the  distance  between  the 

corresponding  feature  vectors  fil  that  is, 


1:  GetSiftFeatures(J) 

Input:  /,  the  source  image  (scalar- valued). 

Returns  a  sequence  of  SIFT  feature  descriptors  detected  in  /. 

2:  (G,  D)  BuildSiftScaleSpace(/,  crs,  cr0,  P,  Q)  >  Alg.  25.2 

3:  C  <—  GetKeyPoints(D) 

4:  S  <—  ( )  >  empty  list  of  SIFT  descriptors 

5:  for  all  k!  £  C  do  D >  k'  —  (p,q,x,y) 

6:  A  <—  GetDominantOrientations(G,  >  Alg.  25.6 

7:  for  all  6  £  T  do 

8:  s  MakeSiftDescriptor(G,  fex,  0)  >  Alg.  25.8 

9:  S<-S~(s) 

10:  return  S 


11: 


12 

13 

14 

15 

16 

17 

18 

19 

20 


21 

22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 


GetKeypoints(D) 

D:  DoG  scale  space  (with  P  octaves,  each  containing  Q  levels). 
Returns  a  set  of  key  points  located  in  D. 

C  <—  ( )  >  empty  list  of  key  points 

for  p  <-o,...  ,  P  —  1  do  >  for  all  octaves  p 

for  q  <—  0, .  . . ,  Q  —  1  do  t>  for  all  scale  levels  q 

E  <—  FindExtrema(D,p,  q) 

for  all  k  £  E  do  D>  k  =  (p,  q1  u,  v ) 

k'  RefineKeyPosition  (D,  k)  >  Alg.  25.4 

if  k'  7^  nil  then  >  k'  =  (p,  q ,  x,  y) 

C  <—  C  ^  (kr)  >  add  refined  key  point  k' 

return  C 


FindExtrema(D,  p,  q) 

D„  „  u-  GetScaleLevel(D,  p,  q) 
(M,N)<r-  Size(DM) 

P  ^—  ( ) 

for  r  <—  1, . . . ,  M  —  2  do 


for  u  u-  1, . . . ,  N  —  2  do 


if 


tmag  then 


P ^p,q  (^6  ^) 
fc  G-  (p,  q ,  r,  x) 

Nc  GetNeighborhood(D,  fc) 
if  lsExtremum(Nc)  then 

E<-E~(k) 


return  E 


>  empty  list  of  extrema 


>  Alg.  25.5 

>  Alg.  25.5 
>  add  k  to  E 


25.5  Matching  SIFT 
Features 

Alg.  25.3 

SIFT  feature  extraction 
(part  1).  Top-level  SIFT  pro¬ 
cedure.  Global  parameters: 

Gb!  Gag  ’  Qi  p  (see  Table 

25.5). 


dist(Si,  sj)  :=  f,  -  /, 


3 


(25.101) 


where  ||  •  •  •  ||  denotes  an  appropriate  norm  (typically  Euclidean,  al¬ 
ternatives  will  be  discussed  further).25 

Note  that  this  distance  is  measured  between  individual  points 
distributed  in  a  high-dimensional  (typically  128-dimensional)  vector 
space  that  is  only  sparsely  populated.  Since  there  is  always  a  best¬ 
matching  counterpart  for  a  given  descriptor,  matches  may  occur  be¬ 
tween  unrelated  features  even  if  the  correct  feature  is  not  contained 
in  the  target  set.  This  is  particularly  critical  if  feature  matching  is 
used  to  determine  whether  two  images  show  any  correspondence  at 
all. 

Obviously,  significant  matches  should  exhibit  small  feature  dis¬ 
tances  but  setting  a  fixed  limit  on  the  acceptable  feature  distance 


25 


See  also  Sec.  B.1.2  in  the  Appendix. 
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Alg.  25.4 

SIFT  feature  extraction 
(part  2).  Position  refinement. 

Global  parameters:  nrefine, 
^peak)  Pmax  (see  Table  25.5). 


3 

4 

5 

6: 

7: 

8: 


9: 

10 

11 

12 

13: 

14 

15 

16 

17: 

18: 

19: 

20 

21 

22 

23 

24 

25 

26 

27 


RefineKeyPosition(D,  k) 

Input:  D,  hierarchical  DoG  scale  space;  k  =  (p,  q1  u ,  v),  candidate 
(extremal)  position. 

Returns  a  refined  key  point  k'  or  nil  if  no  proper  key  point  could 
be  localized  at  or  near  the  extremal  position  k. 


®max  ^ 


(Pmax  +  k 


max 


k'  nil 

done  A-  false 
n  A-  1 


>  see  Eq.  25.72 
D>  refined  key  point 


>  number  of  repositioning  steps 


while  -i done  A  n  <  nrefine  A  lslnside(D,  k)  do 
Nc  A-  GetNeighborhood(D,  k) 

fda 

V  =  |  <-  Gradient(Nc) 

\d. 


>  Alg.  25.5 


>  Alg.  25.5 


HD  = 


'dxx  dxy  dxa 

dXy  dyy  dya 
■  dJxcr  dyer  d( ja 


if  det(HD)  =  0  then 
done  A-  true 
else 


A-  Hessian(Nc) 


t>  Alg.  25.5 


d  = 


A- 


Hd  •  V 


>HD  is  not  invertible 
>  ignore  this  point  and  finish 


>  Eq.  25.60 


if  \x'\  <0.5  A  | y'\  <0.5  then  >  stay  in  the  same  DoG  cell 
done  A-  true 

^peak  E-  Nc(0, 0,  0)  +  |  •  VTd  t>  Eq.  25.61 


HXy  <~ 


dXX  dXy 

dXy  dyy 


>  extract  2D  Hessian  from  H 


D 


if  |Dpeak|  tpeak  A  det(Hxy)  0  then 

1 2 


CL  i — 


[trace  (Hxy)]' 
det(Hxy) 


>  Eq.  25.69 


if  a  <  amax  then  t>  suppress  edges,  Eq.  25.72 
k'  A-  k  +  (0,  0,  x  ,  y')1  D>  rehned  key  point 


else 


Move  to  a  neighboring  DoG  position  at  same  level  p,  q: 
u  A-  min(l,  max(— 1,  round(x/)))  D>  move  by  max.  ±1 
v'  A-  min(l,  max(— 1,  round(p/)))  >  move  by  max.  ±1 
k  A-  k  +  (0,  0,  u  ,  v')1 
n  A-  n  +  1 

return  k'  D>  k'  is  either  a  rehned  key  point  position  or  nil 


turns  out  to  be  inappropriate  in  practice,  since  some  descriptors  are 
more  discriminative  than  others.  The  solution  proposed  in  [153]  is 
to  compare  the  distance  obtained  for  the  best  feature  match  to  that 
of  the  second-best  match.  For  a  given  reference  descriptor  sY  E  S^a\ 
the  best  match  is  defined  as  the  descriptor  s1  E  S ^  which  has  the 
smallest  distance  from  sr  in  the  multi-dimensional  feature  space,  that 
is, 


s1  =  argmin  dist(sr,  Sj), 

s0es^ 


(25.102) 
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1:  IsInsideiD.  k) 

Checks  if  coordinate  k  =  (p,  q:  u ,  v)  is  inside  the  DoG  scale  space 

D. 

2:  (p,  q ,  u,v)  A-  k 

3:  (M,  IV)  A-  Size(GetScalel_evel(D,p, q)) 

4:  return  (0  <  u  <  M  —  1)  A  (0  <  v  <  N—l)  A  (0  <  q  <  Q) 

5:  GetNeighborhood(D,  k)  i >  k  —  (p,q,u,v) 

Collects  and  returns  the  3x3x3  neighborhood  values  around 
position  k  in  the  hierarchical  DoG  scale  space  D. 

6:  Create  map  Nc  :  {—1,  0,  l}3  i-A  M 

7:  for  all  (i,j,k)  £  {— 1,0,  l}3  do  >  collect  3x3x3  neighborhood 

8:  Nc(z,  j,  k)  A-  DPj9_|_fc(iz+i,  v-\-j) 

9:  return  Nc 


10:  lsExtremum(Nc)  >Ncisa3x3x3  map 

Determines  if  the  center  of  the  3D  neighborhood  Nc  is  either  a 
local  minimum  or  maximum  by  the  threshold  textrm  >  0.  Returns 
a  boolean  value  (i.e.,  true  or  false). 

11:  c  A-  Nc(0, 0, 0)  >  center  DoG  value 

12:  isMin  A  c  <  0  A  (c  +  textrm)  <  min  Nc(z,  j,  k)  >  s.  Eq.  25.54 

(0,0,0) 

13:  isMax  A  c  >  0  A  (c  -  textrm)  >  max  Nc(i,  j,  k )  >  s.  Eq.  25.55 

(GAGA 

(0,0,0) 

14:  return  isMin  V  isMax 


15: 


Gradient(Nc) 

Returns  the  estim. 

N, 


>Ncisa3x3x3  map 
gradient  vector  (V)  for  the  3D  neighborhood 


■c  • 


16: 

dx 

A- 

0.5  • 

(Nc(l,2, 

1)- 

-Nc(l, 

0,1)) 

17: 

dy 

A- 

0.5  • 

(Ne(l,l, 

2)- 

Nc(l, 

1,0)) 

18: 

da 

A- 

0.5  • 

(Nc(2, 1, 

1)- 

-  Nc(0, 

1,1)) 

19: 

V 

A- 

(dx  5  dy ,  da ) 

20: 

return  V 

>  see  Eq.  25.56 


21 

22 

23 

24 

25 

26 
27 


Hessian  (Nc) 

Returns  the  estim. 
dxx  A-  Nc(— 1,  0,  0) 
dyy  A-  Nc(0,  —1, 0) 
dacT  A-  Nc(0,  0,  — 1) 
dxy  A-  [  Nc(l,  1, 0)  - 

dxa  [  Nc(l,  0, 1)  - 
dyc r  A-  [  Nc(0, 1, 1) - 


>Ncisa3x3x3  map 
Hessian  matrix  (H)  for  the  neighborhood  Nc. 

—  2-  Nc(0,  0,  0)  +  Nc(l,  0,  0)  t>  see  Eq.  25.58 

—  2-Nc(0, 0,  0)  +  Nc(0, 1,  0) 

—  2-Nc(0, 0,  0)  +  Nc(0, 0, 1) 

-NC(— 1, 1,  0)  — Nc(l,  —  1,  0)  +  Nc(— 1,  —  1,  0)  ]  /4 
_  Nc(— 1, 0, 1)  —  Nc(l,  0,  —  1)  +  NC(— 1, 0,  —  1)  ]  /4 
-Nc(0,  -1, 1)  — Nc(0, 1,  — 1)  +  NC(0,  -1,  -1)  ]  / 4 


28: 


29: 


'd  d  d 
H  <—  I  dxy  dyy  dya 

^dXCr  dye?  dfjfj 

return  H 


25.5  Matching  SIFT 
Features 

Alg.  25.5 

SIFT  feature  extraction 
(part  3):  Neighborhood  op¬ 
erations.  Global  parameters: 
Q,  textrm  (see  Table  25.5). 


and  the  primary  distance  is  dr  l  =  dist(sr,s1).  Analogously,  the 
second-best  matching  descriptor  is 

s2  =  argmin  dist(sr,  Sj),  (25.103) 

s0es^ 

sj  s  i 

and  the  corresponding  distance  is  dl2  =  dist(sr,  s2),  with  dr  l  <  dr  2. 
Reliable  matches  are  expected  to  have  a  distance  to  the  primary 
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25  Scale-Invariant 
Feature  Transform 

(SIFT) 

Alg.  25.6 

SIFT  feature  extraction 
(part  4):  Key  point  orien¬ 
tation  assignment.  Global 
parameters:  nsmooth, 
tdomor  (see  Table  25.5). 


1:  GetDominantOrientations(G,  k') 

Input:  G,  hierarchical  Gaussian  scale  space;  k!  —  (p,q,x,y),  re¬ 
fined  key  point  at  octave  p,  scale  level  q  and  spatial  position  x,  y 
(in  octave’s  coordinates). 

Returns  a  list  of  dominant  orientations  for  the  key  point  k! . 

2:  <—  GetOrientationHistogram(G,  k')  >  Alg.  25.7 

3:  SmoothCircular(h0,  nsmooth) 

4:  A  v-  FindPeakOrientations^^) 

5:  return  A 

6:  SmoothCircular(tc,  niter) 

Smooths  the  real-valued  vector  x  =  (  *JC  Q  ^  )  circularly  us¬ 

ing  the  3-element  kernel  H  =  (/i0,  h1,h2),  with  h1  as  the  hot-spot. 
The  filter  operation  is  applied  niter  times  and  “in  place”,  i.e. ,  the 
vector  x  is  modified. 

7:  (/i0,  hi,  h2)  <—  \  •  (1,  2, 1)  >  ID  filter  kernel 

8:  n  V-  Size(a?) 

9:  for  i  <—  1, . . . ,  niter  do 

10:  s  V-  a?(0) 

11:  p  V-  x(n  —  1) 

12:  for  j  —  0 , . . . ,  Tt  —  2  do 

13:  c  V-  x(j) 

14:  x(j)  <-  h0-p  +  h  1-x(j)  +  h2-x(j  + 1) 

15:  p  V-  c 

16:  x(n—  1)  V-  h0  -p  +  h1  -x(n—  1)  +  h2 -s 

17:  return 

18:  FindPeakOrientations^) 

Returns  a  (possibly  empty)  sequence  of  dominant  directions  (an¬ 
gles)  obtained  from  the  orientation  histogram  h^. 

19:  n  V-  Size(h^) 

20:  A<-() 

21:  hmax  <-  max  h0(z) 

0  <i<n 

22:  for  k  V-  0, . .  . ,  n—  1  do 

23:  hc  V-  h  (k) 

24:  if  hc  >  tdomor  •  hmax  then  >  only  accept  dominant  peaks 

25:  hp  V-  h^((/c— 1)  mod  n) 

26:  hn  V-  h^((/c+l)  mod  n) 

27:  if  (/ic  >  hp)  A  (/ic  >  /in)  then  >  local  max.  at  index  k 

28:  k  V-  k  H 7 — — — — y  >  quadr.  interpol.,  Eq.  25.85 

2-(^p-2-/ic  +  Ln) 

29:  6  ^  (k  •  mod  27t  >  domin.  orientation,  Eq.  25.86 

30:  A^A^\0) 

31:  return  A 


feature  sx  that  is  considerably  smaller  than  the  distance  to  any  other 
feature  in  the  target  set.  In  the  case  of  a  weak  or  ambiguous  match, 
on  the  other  hand,  it  is  likely  that  other  matches  exist  at  a  distance 
similar  to  dr  l,  including  the  second-best  match  s2.  Comparing  the 
best  and  the  second-best  distances  thus  provides  information  about 
the  likelihood  of  a  false  match.  For  this  purpose,  we  define  the  feature 
distance  ratio 


d 


r,l 


d 


r,2 


dist(sr,  £]_) 
dist(sr,  s2)  ’ 
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Pmatch(^r5  ^1 5  ^2)  * 


(25.104) 


1: 


2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 


GetOrientationHistogram(G,  k') 

Input:  G,  hierarchical  Gaussian  scale  space;  k'  =  (p,q,x,y),  re¬ 
fined  key  point  at  octave  p,  scale  level  q  and  relative  position 
x,y. 

Returns  the  gradient  orientation  histogram  for  key  point  k' . 

G„  n  <—  GetScaleLevelfG,  p,  q) 

(M,N)  <—  Size(GPi9) 

Create  a  new  map  :  [0,  norient  — 1]  i— >  M.  >  new  histogram 

for  i  i —  0, . . . ,  norient  — 1  do  >  initialize  to  zero 

<—  o 

crw  •<—  1.5  •  cr0  •  >  a  of  Gaussian  weight  fun.,  see  Eq.  25.76 

rw  <—  max(l,  2.5  •  crw)  >  rad.  of  weight  fun.,  see  Eq.  25.77 

'Umin  <-  max( [a;  -  rwJ,  1) 

»max  «-  min(fa:  +  rw],  M—2) 

Enin  max(|j/  -  rwJ,  1) 

^max  <-  min([p  +  rw] ,  N  —  2) 


for  u  4 —  u 


do 


mm  5  •  •  •  5  ^max 

for  v  -< —  v  ■  v  do 

u  '  ^mm?  •  •  •  5  ^max 

r2  <—  (u  —  x)2  +  (v  —  y)2 

if  r2  <  r then 

(E,  (j))  <—  GetGradientPolar(Gp  g,  u,  v) 


>  see  below 


wG  <—  exp( 
z  <—  E  •  wG 


(u  —  x)z  +  (v-y)- 


2a: 


norient 


K4>  ~  27 r 

OL  4 — 


0 

L*G 


k0  ^  Kd  ^orient 

fci  <-  (fc0  +  1)  mod  norient 

Mfco)  <—  (1— a)  •  a 

hMAq)  e —  ol  •  z 


)  >  Gaussian  weight 

>  quantity  to  accumulate 

...  r~  \  norient  I  norient  1 

L  2  2  j 

l>  ol  E  [0, 1] 

>  lower  bin  index 

>  upper  bin  index 

>  update  bin  k0 

>  update  bin  k1 


return  h 


4> 


27: 


28: 

29: 

30: 

31: 


GetGradientPolar(Gp  g,  u,  v ) 

Returns  the  gradient  magnitude  (E)  and  orientation  (0)  at  posi¬ 
tion  (u,v)  of  the  Gaussian  scale  level  GPjg. 


d 

d 


<-  0.5 


y 


^p,q  (^T  1 5  ^p,g(^  1? 

^p,g  (^b  ^4“  1)  ^p,g(^b^  1) 


D>  gradient  at 


1/2 

E  «—  (d2  +  d2)  >  gradient  magnitude 

<j)  <—  ArcTan(dx,  dy)  >  gradient  orientation  (— n  <  <j)  <  7r) 

return  ( E ,  0) 


25.5  Matching  SIFT 
Features 

Alg.  25.7 

SIFT  feature  extraction 
(part  5):  Calculation  of  the 
orientation  histogram  and  gra¬ 
dients  from  Gaussian  scale 
levels.  Global  parameters: 
norient  (see  Table  25.5). 


such  that  pmatch  £  [0,1].  If  the  distance  dr  l  between  sY  and  the 
primary  feature  is  small  compared  to  the  secondary  distance  dr  2, 
then  the  value  of  pmatch  is  small  as  well.  Thus,  large  values  of  pmatch 
indicate  that  the  corresponding  match  (between  sr  and  Si)  is  likely 
to  be  weak  or  ambiguous.  Matches  are  only  accepted  if  they  are 
sufficiently  distinctive,  for  example,  by  enforcing  the  condition 

Pmatch(^r5  ^1?  ^2)  —  Pmax  1  (25.105) 

where  pmax  E  [0, 1]  is  a  predefined  constant  (see  Table  25.5).  The 
complete  matching  process,  using  the  Euclidean  distance  norm  and 
sequential  search,  is  summarized  in  Alg.  25.11.  Other  common  op¬ 
tions  for  distance  measurement  are  the  L1  and  norms. 
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25  Scale-Invariant 
Feature  Transform 

(SIFT) 

Alg.  25.8 

SIFT  feature  extraction 
(part  6):  Calculation  of 
SIFT  descriptors.  Global  pa¬ 
rameters:  Q,  cr0,  sd,  nspat , 
nangi  (see  Table  25.5). 


1: 


2 

3 

4 

5 

6 

7 

8 

9 

10 

11 


MakeSiftDescriptor(G,  k! ,  0) 

Input:  G,  hierarchical  Gaussian  scale  space;  k!  —  (p,q,x,y),  re- 
hned  key  point;  0 ,  dominant  orientation. 

Returns  a  new  SIFT  descriptor  for  the  key  point  k' . 


G„  „  v-  GetScaleLevel(G,  p.  q) 
(M,  N)  4-  Size(Gp  9) 


Tg  CTq  •  2Q 
Wd  Sd  -  Tg 

cy d  i —  0.25  •  wd 
T i —  2.5  •  O’  d 


>  decimated  scale  at  level  q 
>  descriptor  size  is  prop,  to  key  point  scale 
>  width  of  Gaussian  weighting  function 
>  cutoff  radius  of  weighting  function 


Vin  <-  max(|z-rdJ,  1) 
»max  «-  min(p+rd] ,  M— 2) 
^min  «-  max(Ly-rdJ,  1) 

^max  <-  min(\y+rd],N  —  2) 


12: 

13 

14 

15 

16 

17 

18 

19: 

20 

21 

22 

23 

24 

25 

26 

27: 

28: 

29: 


Create  map  hv  :  nspat  xnspat  xnangl  gR  >  gradient  histogram 

hv 

for  all  (i,  j ,  /c)  G  nspat  ^  Tspat  ^  nangi 


hv(b  3i  k)  0 
for  U  i —  Wmin)  .  .  .  ,  U 


>  initialize  hv  to  zero 


max 


do 


for  v  4 —  v  •  v  do 

1  yj  1  u  '  ^mm?  •  •  •  •>  ^max 

r2  <—  (u  —  x)2  +  (v  —  y)2 

if  r2  <  rd  then 

Map  to  canonical  coord,  frame,  with  u' ,  v  G  [—  i,  +4 


ii. 
2  1 ' 


<— 


cos( — O')  —sin  (—6) 
sin(— 6)  cos  (—0) 


v  j  wd 

(E,  <j))  <—  GetGradientPolar(GP  g,  u,  v)  >  Alg.  25.7 
<f>r  <—  (<f>  —  0)  mod  2tt  >  normalize  gradient  angle 

wG  <—  exp(—  ^-)  >  Gaussian  weight 

d 

z  V-  E  •  wG  t>  quantity  to  accumulate 

UpdateGradientHistogram(hv,  id,  vf ,  <£',  z)  >  Alg.  25.9 

/sift  MakeFeatureVector(hv)  t>  see  Alg.  25.10 

cr  <—  ctq  •  2p+q/®  t>  absolute  scale,  Eq.  25.35 


X,  \  ^2 p  ■  X 
y  J  \y. 

s  4-  {x',y',a,e,fsiit) 

return  s 


>  real  position,  Eq.  25.45 
D>  create  a  new  SIFT  descriptor 


25.5.2  Examples 

The  following  examples  were  calculated  on  pairs  of  stereographic  im¬ 
ages  taken  at  the  beginning  of  the  20th  century.26  From  each  of  the 
two  frames  of  a  stereo  picture,  a  sequence  of  (ca.  1000)  SIFT  de¬ 
scriptors  (marked  by  blue  rectangles)  was  extracted  with  identical 
parameter  settings.  Matching  was  done  by  enumerating  all  possi¬ 
ble  descriptor  pairs  from  the  left  and  the  right  image,  calculating 
their  (Euclidean)  distance,  and  showing  the  25  closest  matches  ob¬ 
tained  from  ca.  1000  detected  key  points  in  each  frame.  Only  the 
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26 


The  images  used  in  Figs.  25.28-25.31  are  historic  stereographs  made 
publicly  available  by  the  Library  of  Congress  (www.loc.gov). 


1:  UpdateGradientHistogram(hv,  u,  v  ,  </>',  z) 


Input:  hv,  gradient  histogram  of 

size  nspat  x  nSpa^  x  nangp  with 

hv(i,j,  k)  G  M;  u',v'  G  [—0.5,  0.5], 

normalized  spatial  position; 

(f)'  G  [0,  2tt)  ,  normalized  gradient  orientation;  z  G  R,  quantity  to 

be  accumulated  into  hv. 

Returns  nothing  but  modifies  the  histogram  hv. 

2 

^  ^  -^spat  ^  T  0.5  (H-spat  f ) 

>  see  Eq.  25.92 

3 

3  ^  -^spat  ^  T  0.5  (U-spat  f ) 

>  -0.5  <  i’,j'  <  nspat  — 0.5 

4 

k  ^  nangl  ‘  2tt 

p,  nangl  ^  ^  nangl 

2  2 

5 

A  l/J 

6 

4 —  ATI 

7 

i  (i0,  q)  >  see 

Eq.  25.93;  i(0)  =  i0,  i(l)  =  b 

8 

Jo  L/J 

9 

3i  Jo  +  1 

10 

j  (joJi) 

>  j(0)  =  Jo,  j(l)  =  Ji 

11 

k0  <-  [k'\  mod  nangl 

12 

ki  <-  (fc0  +  l)  mod  nangl 

13 

k  <—  (/c0,  Aq) 

>  k(0)  —  fc0,  k(l)  —  ki 

14 

Oq  4 —  q  —  i 

t>  see  Eq.  25.94 

15 

cq  i —  1  —  Oq 

16 

A  <—  (<a0,  cq) 

>  A(0)  —  a0,  A(l)  —  cq 

17 

do  3i  ~  j' 

18 

di  1  —  do 

19 

B  ^  (do  5  di ) 

>  S(0)  =  A,  B(1)  =  A 

20 

7o  <-  1  —  (k1  —  [k'\) 

21 

7i  -f-  1  -  7o 

22 

C  (7o  5  7i ) 

>  C(0)  =  7o,  C(l)  =  7j 

Distribute  quantity  z  among  (up  to)  8  adjacent  histogram  bins: 

23 

for  all  a  G  {0, 1}  do 

24 

i  e —  i(n) 

25 

if  (0  <  i  <  nspat)  then 

26 

w a  4 —  A(n) 

27 

for  all  b  G  {0,  1}  do 

28 

3  j(&) 

29 

if  (0  <  j  <  nspat)  then 

30 

w b  i —  _£?(&) 

31 

for  all  c  G  {0,  1}  do 

32 

k  <—  k(c) 

33 

w c  e —  C (c) 

34 

h v(mA) 

wa-wb-wc  >  see  Eq.  25.95 

35 

return 

25.5  Matching  SIFT 
Features 

Alg.  25.9 

SIFT  feature  extraction 
(part  7):  Updating  the  gradi¬ 
ent  descriptor  histogram.  The 
quantity  z  pertaining  to  the 
continuous  position  (u',v',4>') 
is  to  be  accumulated  into  the 
3D  histogram  hv  {u  ,  v'  are 
normalized  spatial  coordinates, 
(/)'  is  the  orientation).  The 
quantity  2  is  distributed  over 
up  to  eight  neighboring  his¬ 
togram  bins  (see  Fig.  25.26)  by 
tri-linear  interpolation.  Note 
that  the  orientation  coordinate 
4>'  receives  special  treatment 
because  it  is  circular.  Global 
parameters:  nspat,  nangl  (see 
Table  25.5). 


best  25  matches  are  shown  in  the  examples.  Feature  matches  are 
numbered  according  to  their  goodness,  that  is,  label  “1”  denotes  the 
best-matching  descriptor  pair  (with  the  smallest  feature  distance). 
Selected  details  from  these  results  are  shown  in  Fig.  25.29.  Unless 
otherwise  noted,  all  SIFT  parameters  are  set  to  their  default  values 
(see  Table  25.5). 

Although  the  use  of  the  Euclidean  (L2)  norm  for  measuring  the 
distances  between  feature  vectors  in  Eqn.  (25.101)  is  suggested  in 
[153],  other  norms  have  been  considered  [130,181,227]  to  improve 
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25  Scale-Invariant 
Feature  Transform 

(SIFT) 

Alg.  25.10 

SIFT  feature  extraction 
(part  8):  Converting  the 
orientation  histogram  to  a 
SIFT  feature  vector.  Global 
parameters:  nspat,  nangl, 
Fclip ’  ^fscaie  (sgg  Table  25.5). 
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1: 


2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 


MakeSiftFeature  Vector  ( h  v ) 

Input:  hv,  gradient  histogram  of  size  nspat  x  nspat  x  nangl. 
Returns  a  ID  integer  (unsigned  byte)  vector  obtained  from  hv. 
Create  map  f  :  [0,  napat  •  nangl  -  1]  ka  IR  >  new  ID  vector  / 
m  V-  0 


for  i  4 —  0, . . . ,  n 


spat 


1  do 


>  flatten  hv  into  / 


for  j  <-  0, . . .  ,nspat-l  do 
for  k  <—  0, . . . ,  nangl  —  1  do 
/(to)  h v(i,j,  k) 
rri  <—  rri  +  1 


Normalize(/) 

ClipPeaks(/,  tfciip) 

Normalize(/) 

/sift  <-  MapToBytes(/ ,  sfscale) 
return  /sift 


14:  Normalize^) 

Scales  vector  x  to  unit  norm.  Returns  nothing,  but  x  is  modified. 
15:  n  V-  Size(tc) 

n—  1 

16:  s  V-  Y2  x(i) 


i= o 


17 

for  i  A-  0, . . . ,  n—  1  do 

18 

X{i)  <r~  i  •  X(l) 

19 

return 

20 

ClipPeaks(x,  a;max) 

Limits  the  elements  of  x  to 
modified. 

/max-  Returns  nothing,  but  x  is 

21 

n  S\ze(x) 

22 

for  i  <—  0, . . . ,  n—  1  do 

23 

*(*)  min(a;(i), xmax) 

24 

return 

25 

MapToBytes(®,  s) 

Converts  the  real-valued  vector  x  to  an  integer  (unsigned  byte) 

valued  vector  with  elements 
s  >  0. 

in  [0,255],  using  the  scale  factor 

26 

n  S\ze(x) 

27 

Create  a  new  map  xint  :  [0,  n 

—  1]  i-A  [0,  255]  >  new  byte  vector 

28 

for  i  <—  0, . . . ,  n—  1  do 

29 

a  round  (s  •  x(i)) 

c>  a  (E  Nq 

30 

ccint(0  min  (a,  255) 

>  *int(*)  €  [0,  255] 

31 

return  xint 

the  statistical  robustness  and  noise  resistance.  In  Fig.  25.30,  match¬ 
ing  results  are  shown  using  the  L1?  L2,  and  norms,  respectively. 
Note  that  the  resulting  sets  of  top-ranking  matches  are  almost  the 
same  with  different  distance  norms,  but  the  ordering  of  the  strongest 
matches  does  change. 

Figure  25.31  demonstrates  the  effectiveness  of  selecting  feature 
matches  based  on  the  ratio  between  the  distances  to  the  best  and  the 
second-best  match  (see  Eqns.  (25.102)-(25.103)).  Again  the  figure 
shows  the  25  top-ranking  matches  based  on  the  minimum  (L2)  feature 
distance.  With  the  maximum  distance  ratio  pmax  set  to  1.0,  rejection 
is  practically  turned  off  with  the  result  that  several  false  or  ambiguous 
matches  are  among  the  top-ranking  feature  matches  (Fig.  25.31(a)). 


1: 


2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 


MatchDescriptors(8'(a) ,  S ,  pmax) 

Input:  S^a\  S^b\  two  sets  of  SIFT  descriptors;  Pmax)  max.  ratio 
of  best  and  second-best  matching  distance  (s.  Eq.  25.105). 
Returns  a  sorted  list  of  matches  —  (sa,  sbl  dij) ,  with 
S^a\  sb  G  S ^  and  dij  being  the  distance  between  sa,  sb  in  feature 
space. 


M<-() 

for  all  sa  G  S ^  do 

C —  nil,  dr  \  4 —  OO 

S2  ^ —  nil,  dr  2  ^ —  oo 

for  all  sb  G  do 

d  <-  Dist(sa,  sb) 
if  d  ^  dj~  ]  then 

*®2  ^ —  S]_,  dv  2  ^ —  dr,i 
dT  i  e —  d 

else 


>  empty  sequence  of  matches 

>  best  nearest  neighbor 
>  second-best  nearest  neighbor 

>  d  is  a  new  ‘best’  distance 


if  d  <C  dr  2  then  o  d  is  a  new  second-best  distance 
S2  A-  sb ,  dr  2  d 

if  (s2  A  nil)  A  (^M-  <  pmax)  then  >  Eqns.  (25.104-25.105) 

ar ,  2 

m  G-  (sa,  sl5  dr>1)  >  add  a  new  match 

M  ^  (m) 

Sort(M)  >  sort  M  to  ascending  distance  dr  l 

return  M 


19:  Dist(sa,  sb) 

Input:  descriptors  sa  =  (xa, ya,  aa,  0a,  fa),  sb  =  (xb,yb,ab,0b, 
fh).  Returns  the  Euclidean  distance  between  feature  vectors  fa 
and  fb. 

20:  d  <—  ||  fa  —  /b|| 

21:  return  d 


25.6  Efficient  Feature 
Matching 

Alg.  25.11 

SIFT  feature  matching  using 
Euclidean  feature  distance  and 
linear  search.  The  returned 
sequence  of  SIFT  matches  is 
sorted  to  ascending  distance 
between  corresponding  feature 
pairs.  Function  Dist(sa ,  sb) 
demonstrates  the  calculation 
of  the  Euclidean  (L2)  feature 
distance,  other  options  are  the 
Lx  and  norms. 


With  pmax  set  to  0.8  and  finally  0.5,  the  number  of  false  matches  is 
effectively  reduced  (Fig.  25.31(b, c)).27 


25.6  Efficient  Feature  Matching 

The  task  of  finding  the  best  match  based  on  the  minimum  distance 
in  feature  space  is  called  “nearest-neighbor”  search.  If  performed 
exhaustively,  evaluating  all  possible  matches  between  two  descriptor 
sets  S ^  and  S ^  of  size  Na  and  Nb ,  respectively,  requires  Na-Nb 
feature  distance  calculations  and  comparisons.  While  this  may  be 
acceptable  for  small  feature  sets  (with  maybe  up  to  1000  descrip¬ 
tors  each),  this  linear  (brute- force)  approach  becomes  prohibitively 
expensive  for  large  feature  sets  with  possibly  millions  of  candidates, 
as  required,  for  example,  in  the  context  of  image  database  index¬ 
ing  or  robot  self-localization.  Although  efficient  methods  for  exact 
nearest-neighbor  search  based  on  tree  structures  exist,  such  as  the 
k-d  tree  method  [80],  it  has  been  shown  that  these  methods  lose 
their  effectiveness  with  increasing  dimensionality  of  the  search  space. 


27 


P  max 


0.8  is  recommended  in  [153]. 
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Fig.  25.28 

SIFT  feature  matching  exam¬ 
ples  on  pairs  of  stereo  images. 
Shown  are  the  25  best  matches 
obtained  with  the  L2  feature 
distance  and  pmax  =  0.8. 
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(a) 


(c) 


In  fact,  no  algorithms  are  known  that  significantly  outperform  ex¬ 
haustive  (linear)  nearest  neighbor  search  in  feature  spaces  that  are 
more  than  about  10-dimensional  [153].  SIFT  feature  vectors  are  128- 
dimensional  and  therefore  exact  nearest-neighbor  search  is  not  a  vi¬ 
able  option  for  efficient  matching  between  large  descriptor  sets. 

The  approach  taken  in  [21,153]  abandons  exact  nearest-neighbor 
search  in  favor  of  finding  an  approximate  solution  with  substan¬ 
tially  reduced  effort,  based  on  ideas  described  in  [9].  This  so-called 


Left  frame 


Right  frame 


25.6  Efficient  Feature 
Matching 


Fig.  25.29 

Stereo  matching  examples 
(enlarged  details  from  Fig. 
25.28). 


“best-bin- first”  method  uses  a  modified  fc-d  algorithm,  which  searches 
neighboring  feature  space  partitions  in  the  order  of  their  closest  dis¬ 
tance  from  the  given  feature  vector.  To  limit  the  exploration  to  a 
small  fraction  of  the  feature  space,  the  search  is  cut  off  after  check¬ 
ing  the  first  200  candidates,  which  results  in  a  substantial  speedup 
without  compromising  the  search  results,  particularly  when  combined 
with  feature  selection  based  on  the  ratio  of  primary  and  secondary 
distances  (see  Eqns.  (25.104)-(25.105)).  Additional  details  can  be 
found  in  [21]. 
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Fig.  25.30 

Using  different  distance 
norms  for  feature  match¬ 
ing.  Lx  (a),  L2  (b),  and 
norm  (c).  All  other  param¬ 
eters  are  set  to  their  de¬ 
fault  values  (see  Table  25.5). 


(a)  L1-norm 


(b)  L2-norm 


(c)  L^-norm 


Approximate  nearest-neighbor  search  in  high-dimensional  spaces 
is  not  only  essential  for  practical  SIFT  matching  in  real  time,  but  is 
a  general  problem  with  numerous  applications  in  various  disciplines 
and  continued  research.  Open-source  implementations  of  several  dif¬ 
ferent  methods  are  available  as  software  libraries. 
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25.7  Java  Implementation 

A  new  and  complete  Java  implementation  of  the  SIFT  method  has 
been  written  from  ground  up  to  complement  the  algorithms  described 
in  this  chapter.  Space  limitations  do  not  permit  a  full  listing  here, 
but  the  entire  implementation  and  additional  examples  can  be  found 
in  the  source  code  section  of  this  book’s  website.  Most  Java  methods 
are  named  and  structured  identically  to  the  procedures  listed  in  the 
algorithms  for  easy  identification.  Note,  however,  that  this  imple- 


25.7  Java 
Implementation 

Fig.  25.31 

Rejection  of  weak  or  ambigu¬ 
ous  matches  by  limiting  the 
ratio  of  primary  and  sec¬ 
ondary  match  distance  pmax 
(see  Eqns.  (25.104)-(25.105)) 
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mentation  is  again  written  for  instructional  clarity  and  readability. 
The  code  is  neither  tuned  for  efficiency  nor  is  it  intended  to  be  used 
in  a  production  environment. 


25.7.1  SIFT  Feature  Extraction 

The  key  class  in  this  Java  library  is  Sif  tDetector,  which  implements 
a  SIFT  detector  for  a  given  floating-point  image.  The  following  ex¬ 
ample  illustrates  its  basic  use  for  a  given  ImageProcessor  object 

ip: 


FloatProcessor  I  =  ip . convertToFloatProcessor () ; 

Sif tDetector  sd  =  new  Sif tDetector  (I) ; 

List<Sif tDescriptor>  S  =  sd. getSif tFeatures () ; 

...  //  process  descriptor  set  S 

The  initial  work  of  setting  up  the  required  Gaussian  and  DoG  scale 
space  structures  for  the  given  image  I  is  accomplished  by  the  con¬ 
structor  in  new  SiftDetector (I) . 

The  method  getSif  tFeatures  ()  then  performs  the  actual  fea¬ 
ture  detection  process  and  returns  a  sequence  of  Sif tDescriptor 
objects  (S)  for  the  image  I.  Each  extracted  Sif  tDescriptor  in  S 
holds  information  about  its  image  position  (x,  y),  its  absolute  scale 
<j  (scale)  and  its  dominant  orientation  0  (orientation).  It  also 
contains  an  invariant,  128-element,  int-type  feature  vector  /sift  (see 
Alg.  25.8). 

The  SIFT  detector  uses  a  large  set  of  parameters  that  are  set 
to  their  default  values  (see  Table  25.5)  if  the  simple  constructor  new 
SiftDetector  (I)  is  used,  as  in  the  previous  example.  All  parameters 
can  be  adjusted  individually  by  passing  a  parameter  object  (of  type 
Sif  tDetector .  Parameters)  to  its  constructor,  as  in  the  following 
example,  which  shows  feature  extraction  from  two  images  A,  B  using 
identical  parameters: 


FloatProcessor  la  =  A . convertToFloatProcessor  () ; 
FloatProcessor  lb  =  B . convertToFloatProcessor () ; 
•  •  • 

Sif tDetector . Parameters  params  = 
new  Sif tDetector . Parameters O ; 
params .  sigma_s  =  0.5;  //  modify  individual  parameters 
params . sigma_0  =  1.6; 

•  •  • 

SiftDetector  sdA  =  new  Sif tDetector (la,  params); 
SiftDetector  sdB  =  new  Sif tDetector (lb ,  params); 
List<Sif tDescriptor>  SA  =  sda. getSif tFeatures () ; 
List<Sif tDescriptor >  SB  =  sdb . getSif tFeatures () ; 
•  •  • 

//  process  descriptor  sets  SA  and  SB 
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25.7.2  SIFT  Feature  Matching 


25.8  Exercises 


Finding  matching  descriptors  from  a  pair  of  SIFT  descriptor  sets 
Sa,  Sb  is  accomplished  by  the  class  Sif tMatcher.28  One  descrip¬ 
tor  set  (Sa)  is  considered  the  “reference”  or  “model”  set  and  used  to 
initialize  a  new  Sif  tMatcher  object,  as  shown  in  the  following  exam¬ 
ple.  The  actual  matches  are  then  calculated  by  invoking  the  method 
matchDescriptors  () ,  which  implements  the  procedure  Match Descriptors() 
outlined  in  Alg.  25.11.  It  takes  the  second  descriptor  set  (Sb)  as  the 
only  argument.  The  following  code  segment  continues  from  the  pre¬ 
vious  example: 


Sif tMatcher . Parameters  params  = 

new  Sif tMatcher . Parameters () ; 

//  set  matcher  parameters  here  (see  below) 

SiftMatcher  matcher  =  new  Sif tMatcher (SA,  params); 

List<Sif tMatch>  matches  =  matcher .matchDescriptors (SB) ; 

•  •  • 

//  process  matches 

As  noted,  certain  parameters  of  class  SiftMatcher  can  be  set  indi¬ 
vidually,  for  example, 

params.  norm  =  FeatureDistanceNorm  .  LI ;  // LI,  L2,  or  Linf 
params .  rmMax  =  0.8;  //  pmax,  max.  ratio  of  best  and  second-best  match 
params .  sort  =  true ;  //  set  to  true  if  sorting  of  matches  is  desired 

The  method  matchDescriptors  ()  in  this  prototypical  implemen¬ 
tation  performs  an  exhaustive  search  over  all  possible  descriptor  pairs 
in  the  two  sets  Sa  and  Sb.  To  implement  efficient  approximate 
nearest-neighbor  search  (see  Sec.  25.6),  one  would  pre-calculate  the 
required  search  tree  structures  for  the  model  descriptor  set  (Sa)  once 
inside  Sif tMat cher’s  constructor  method.  The  same  matcher  ob¬ 
ject  could  then  be  reused  to  match  against  multiple  descriptor  sets 
without  the  need  to  recalculate  the  search  tree  structure  over  and 
over  again.  This  is  particularly  effective  when  the  given  model  set  is 
large. 


25.8  Exercises 

Exercise  25.1.  As  claimed  in  Eqn.  (25.12),  the  2D  LoG  function 
La(x,y)  can  be  approximated  by  the  DoG  in  the  form  La{x,y)  ~ 
A  •  (GKa(x,  y )  —  Ga(x,  y)).  Create  a  combined  plot,  similar  to  the  one 
in  Fig.  25.5(b),  showing  the  ID  cross  sections  of  the  LoG  and  DoG 
functions  (with  a  —  1.0  and  y  —  0).  Compare  both  functions  by 
varying  the  values  of  ft  =  2.00,  1.25,  1.10,  1.05,  and  1.01.  How  does 
the  approximation  change  as  ft  approaches  1,  and  what  happens  if  ft 
becomes  exactly  1? 

Exercise  25.2.  Test  the  performance  of  the  SIFT  feature  detection 
and  matching  on  pairs  of  related  images  under  (a)  changes  of  im¬ 
age  brightness  and  contrast,  (b)  image  rotation,  (c)  scale  changes, 


28 


File  imagingbook .  sift .  SiftMatcher .  j  ava. 
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(d)  adding  (synthetic)  noise.  Choose  (or  shoot)  your  own  test  im¬ 
ages,  show  the  results  in  a  suitable  way  and  document  the  parameters 
used. 

Exercise  25.3.  Evaluate  the  SIFT  mechanism  for  tracking  features 
in  video  sequences.  Search  for  a  suitable  video  sequence  with  good 
features  to  track  and  process  the  images  frame-by-frame.29  Then 
match  the  SIFT  features  detected  in  pairs  of  successive  frames  by 
connecting  the  best-matching  features,  as  long  as  the  “match  qual¬ 
ity”  is  above  a  predefined  threshold.  Visualize  the  resulting  feature 
trajectories.  Could  other  properties  of  the  SIFT  descriptors  (such  as 
position,  scale,  and  dominant  orientation)  be  used  to  improve  track¬ 
ing  stability? 


29  In  Image J,  choose  an  AVI  video  short  enough  to  fit  into  main  memory 
and  open  it  as  an  image  stack. 
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Fourier  Shape  Descriptors 


Fourier  descriptors  are  an  interesting  method  for  modeling  2D  shapes 
that  are  described  as  closed  contours.  Unlike  polylines  or  splines, 
which  are  explicit  and  local  descriptions  of  the  contour,  Fourier  de¬ 
scriptors  are  global  shape  representations,  that  is,  each  component 
stands  for  a  particular  characteristic  of  the  entire  shape.  If  one  com¬ 
ponent  is  changed,  the  whole  shape  will  change.  The  advantage  is 
that  it  is  possible  to  capture  coarse  shape  properties  with  only  a  few 
numeric  values,  and  the  level  of  detail  can  be  increased  (or  decreased) 
by  adding  (or  removing)  descriptor  elements.  In  the  following,  we  de¬ 
scribe  what  is  called  “cartesian”  (or  “elliptical”)  Fourier  descriptors, 
how  they  can  be  used  to  model  the  shape  of  closed  2D  contours  and 
how  they  can  be  adapted  to  compare  shapes  in  a  translation-,  scale-, 
and  rotation-invariant  fashion. 


26.1  Closed  Curves  in  the  Complex  Plane 

Any  continuous  curve  C  in  the  2D  plane  can  be  expressed  as  a  func¬ 
tion  /:  R  M2,  with 

m  =  (2)  =  (/$)  •  <2fU) 

with  the  continuous  parameter  t  being  varied  over  the  range  [0,  £max  . 
If  the  curve  is  closed,  then  /( 0)  =  /(trnax)  and  f(t)  =  f(t  +  £max). 
Note  that  fx(t),  fy(t )  are  independent,  real- valued  functions,  and  t 
is  the  path  length  along  the  curve. 

26.1.1  Discrete  2D  Curves 

Sampling  a  closed  curve  C  at  M  regularly  spaced  positions  t0,  U, . . . , 
tM- 1,  with  ti  —  ti_1  —  At  —  Length(C)/M,  results  in  a  sequence 
(vector)  of  discrete  2D  coordinates  V  =  (v0,  v1, ,  vM_1),  with 

V/C  =  (xk,Vk)  =  f(h)-  (26-2) 

©  Spring er-Verlag  London  2016 
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DOI  10.1007/978-1-4471-6684-9  26 


665 


9k  =  (xk  +  i  •  Vk)  €  C 


26  Fourier  Shape 
Descriptors 


Fig.  26.1 

A  closed,  continuous  2D  curve 
C,  represented  as  a  sequence 
of  M  uniformly  placed  sam¬ 
ples  g  =  (g0,  gx,  .  .  .  ,  gM_x) 
in  the  complex  plane. 


Since  the  curve  C  is  closed,  the  vector  V  represents  a  discrete  function 
that  is  infinite  and  periodic,  that  is, 

vk  ^  k-\-pM  i  (26.3) 

for  0  <  k  <  M  and  any  p  G  Z. 

Contour  points  in  the  complex  plane 

Any  2D  contour  sample  vk  =  (xk,yk)  can  be  interpreted  as  a  point 
gk  in  the  complex  plane, 


9k  —  xk  +  i  ‘  Uki  (26.4) 

with  xk  and  yk  taken  as  the  real  and  imaginary  components,  respec¬ 
tively.1  The  result  is  a  sequence  (vector)  of  complex  values 

9  {9c )•>  9ii  ’ '  ’  1 9 m — l)  5  (26.5) 

representing  the  discrete  2D  contour  (see  Fig.  26.1). 

Regular  position  sampling 

The  assumption  of  input  data  being  obtained  by  regular  sampling  is 
quite  fundamental  in  traditional  discrete  Fourier  analysis.  In  prac¬ 
tice,  contours  of  objects  are  typically  not  available  as  regularly  sam¬ 
pled  point  sequences.  For  example,  if  an  object  has  been  segmented  as 
a  binary  region,  the  coordinates  of  its  boundary  pixels  could  be  used 
as  the  original  contour  sequence.  However,  the  number  of  bound¬ 
ary  pixels  is  usually  too  large  to  be  used  directly  and  their  positions 
are  not  strictly  uniformly  spaced  (at  least  under  8-connectivity).  To 
produce  a  useful  contour  sequence  from  a  region  boundary,  one  could 
choose  an  arbitrary  contour  point  as  the  start  position  x0  and  then 
sample  the  x/y  positions  along  the  contour  at  regular  (equidistant) 
steps,  treating  the  centers  of  the  boundary  pixels  as  the  vertices  of  a 
closed  polygon.  Algorithm  26.1  shows  how  to  calculate  a  predefined 
number  of  contour  points  on  an  arbitrary  polygon,  such  that  the  path 
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1  Instead  of  g  x  +  i  •  y,  we  sometimes  use  the  short  notation  g  (x,y) 
or  g  v  for  assigning  the  components  of  a  2D  vector  v  =  (x,  y)  G  R 
to  a  complex  variable  g  £  C. 


1:  SamplePolygonUniformly(V,  M) 

Input:  V  =  (v0, . . . ,  vN_i),  a  sequence  of  N  points  representing 
the  vertices  of  a  2D  polygon;  M,  number  of  desired  sample  points. 
Returns  a  sequence  g  =  (g0, . . . ,  gM-i)  °f  complex  values  rep¬ 
resenting  points  sampled  uniformly  along  the  path  of  the  input 
polygon  V. 

2:  N  <r-  \V\ 

3:  A  <—  •  Path  Length  (V)  >  const,  segment  length  A 

4:  Create  map  g  :  [0,  M—  1]  — >•  C  t>  complex  point  sequence  g 

5:  flf(0)  A-  Complex(I/(0)) 

6:  i  <—  0  D>  index  of  polygon  segment  (vi,  vi+1) 

7:  k  A-  1  >  index  of  next  point  to  be  added  to  g 

8:  a  A-  0  >  path  position  of  polygon  vertex  ir 

9:  (3  A-  A  >  path  position  of  next  point  to  be  added  to  g 

10:  while  (i  <  N )  A  (k  <  M )  do 

11:  vA  A-  V(i) 

12:  vB  A-  V((i  +  1)  mod  N ) 

13:  <5  ^ —  II^b  —  ^aII  >  length  of  segment  (vA,vB) 

14:  while  (j3  <  a  +  5)  A  (k  <  M)  do 

15:  x  A-  vA  +  •  (vB  —  vA)  >  linear  path  interpolation 

16:  g(k)  A-  Complex(a?) 

17:  k  A-  k  +  1 

18:  f3^f3  +  A 

19:  a  e —  a  4-  4 

20:  i  A-  i  +  1 

21:  return  g. 

22:  PathLength(lA)  >  returns  the  path  length  of  the  closed  polygon  V 
23:  N^\V\ 

24:  L  <-  0 

25:  for  i  A-  0, . . . ,  N—  1  do 

26:  i;A  A-  V(i) 

27:  i;B  A-  V ((i  +  1)  mod  N) 

28:  T  e —  L  ||^b  —  ^aII 

29:  return  L. 


26.2  Discrete  Fourier 
Transform  (DFT) 

Alg.  26.1 

Regular  sampling  of  a  poly¬ 
gon  path.  Given  a  sequence  V 
of  2D  points  representing  the 
vertices  of  a  closed  polygon, 
SamplePolygonUniformly(U,  M) 
returns  a  sequence  of  M  com¬ 
plex  values  g  on  the  polygon 
V,  such  that  g(0)  =  V (0)  and 
all  remaining  points  g(k)  are 
uniformly  positioned  along  the 
polygon  path.  See  Alg.  26.9  for 
an  alternate  solution. 


length  between  the  sample  points  is  uniform.  This  algorithm  is  used 
in  all  examples  involving  contours  obtained  from  binary  regions. 

Note  that  if  the  shape  is  given  as  an  arbitrary  polygon,  the  cor¬ 
responding  Fourier  descriptor  can  also  be  calculated  directly  (and 
exactly)  from  the  vertices  of  the  polygon,  without  sub-sampling  the 
polygon  contour  path  at  all.  This  “trigonometric”  variant  of  the 
Fourier  descriptor  calculation  is  described  in  Sec.  26.3.7. 


26.2  Discrete  Fourier  Transform  (DFT) 

Fourier  descriptors  are  obtained  by  applying  the  ID  Discrete  Fourier 
Transform  (DFT)2  to  the  complex- valued  vector  g  of  2D  contour 
points  (Eqn.  (26.5)).  The  DFT  is  a  transformation  of  a  finite,  complex¬ 
valued  signal  vector  g  =  (g0, 34, . . . ,  gM-i)  t°  a  complex- valued  spec- 


See  Chapter  18,  Sec.  18.3. 
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26  Fourier  Shape  trum  G  =  (G0,  Gl5 . . . ,  Gm_i).3  Both  the  signal  and  the  spectrum 
Descriptors  are  of  the  same  length  (M)  and  periodic.  In  the  following,  we  typi¬ 
cally  use  k  to  denote  the  index  in  the  time  or  space  domain,4  and  m 
for  a  frequency  index  in  the  spectral  domain. 


26.2.1  Forward  Fourier  Transform 


The  discrete  Fourier  spectrum  G  =  (G0,  G1? . . . ,  Gm_i)  is  calculated 
from  the  discrete,  complex- valued  signal  g  =  (g0,  g1^ . . . ,  1)  using 

the  forward  DFT,  defined  as5 


1 


M—l 


M  ^  Sk 


1 


M— 1 

-wE*- e” 


k=0 

M—l 


UJ 


_k_ 

m  M 


k= 0 


■  Y  N  +[-yk\  -[cos(2Trm±)  -  i-sin(27r m-jfr) 


1 


k=0  " - v - / 


M—l 


9k 


OJ, 


77  '  ^2  [xk+i-Vk]  ■  [cos(wm^-)  -  i-sin(wm^)], 


k=0 


(26.6) 

(26.7) 

(26.8) 


for  0  <  m  <  M.6  Note  that  ujrn  =  2tt m  denotes  the  angular  frequency 
for  the  frequency  index  m.  By  applying  the  usual  rules  of  complex 
multiplication,  we  obtain  the  real  (Re)  and  imaginary  (Im)  parts  of 
the  spectral  coefficients  Gm  =  (Am  +  i  •  Bm )  explicitly  as 


A 

B 


m 


m 


Re(Gm) 

Im(Gm) 


1 


M—l 


mYI  [%-c°s(wm^)  +yk- sin(wm^)], 


1 


k=0 

M—l 


77  Y  [yk-^i^mJi)  ~  Xk's  inFm^) 


k=0 


(26.9) 

(26.10) 


The  DFT  is  defined  for  any  signal  length  M  >  1.  If  the  signal 
length  M  is  a  power  of  two  (that  is,  M  =  2n  for  some  n  E  N),  the 
Fast  Fourier  Transform  (FFT)7  can  be  used  in  place  of  the  DFT  for 
improved  performance. 


26.2.2  Inverse  Fourier  Transform  (Reconstruction) 


The  inverse  DFT  reconstructs  the  original  signal  g  from  a  given  spec¬ 
trum  G.  The  formulation  is  almost  symmetrical  (except  for  the  scale 


3 


4 


5 

6 

7 


In  most  traditional  applications  of  the  DFT  (e.g.  in  acoustic  processing), 
the  signals  are  real-valued,  that  is,  the  imaginary  components  of  the 
samples  are  zero.  The  Fourier  spectrum  is  generally  complex- valued, 
but  it  is  symmetric  for  real- valued  signals. 

We  use  k  instead  of  the  usual  i  as  the  running  index  to  avoid  confusion 
with  the  imaginary  constant  “i”  (despite  the  deliberate  use  of  different 
glyphs) . 

This  definition  deviates  slightly  from  the  one  used  in  Chapter  18,  Sec. 
18.3  but  is  otherwise  equivalent. 

Recall  that  z  =  x  +  iy  =  \z\  •  (cos^  +  i  •  sin^)  =  \z\  •  ey  with  -0  = 
tan  ~1(y/x). 

See  Chapter  18,  Sec.  18.4.2. 
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1:  FourierDescriptorUniform(g) 

Input:  g  —  (g0, . . . ,  gM-i):  a  sequence  of  M  complex  values, 
representing  regularly  sampled  2D  points  along  a  contour  path. 
Returns  a  Fourier  descriptor  G  of  length  M . 

2:  M  «<—  \g\ 

3:  Create  map  G :  [0,  M  —  1]  — »•  C 

4:  for  m  <—  0,  . . . ,  M—  1  do 

5:  A  <(—  0,  F>  0  t>  real/imag.  part  of  coefficient  Gm 

6:  for  k  <—  0, . . . ,  M—  1  do 

7:  9<~g(k) 

8:  x  Re(g),  y  <—  Im(g) 

9:  (j)  <—  2  •  ir  •  m  • 

10:  A  «<—  A  +  x  •  cos(0)  +  y  •  sin(0)  >  Eq.  26.10 

11:  B  <—  B  —  x  •  sin(</>)  +  y  •  cos(0) 

12:  G(m)  i—  •  (A  +  i  •  R) 

13:  return  G. 


26.2  Discrete  Fourier 
Transform  (DFT) 

Alg.  26.2 

Calculating  the  Fourier  de¬ 
scriptor  for  a  sequence  of  uni¬ 
formly  sampled  contour  points. 
The  complex-valued  contour 
points  in  C  represent  2D  posi¬ 
tions  sampled  uniformly  along 
the  contour  path.  Applying 
the  DFT  to  g  yields  the  raw 
Fourier  descriptor  G. 


factor  and  the  different  signs  in  the  exponent)  to  the  forward  trans¬ 
formation  in  Eqns.  (26.6) — (26.8) ;  its  full  expansion  is 


M  —  1 


M—l 


9k 


V  r 

/  j  ^*171 


ei-27rm-^-  _ 


V  r 

/  j  ^  rri 


gUCUm  •  jy- 


(26.11) 


m— 0 
M—l 


m— 0 


=  [Re(Gm)  +  iTm(Gm)  •  cos(27rm^g)  +  i-sm(27rm^)] 

(26.12) 


m=0 

M—l 


G m 


=  E  [Am  +  i-Bm]  •  [cos(wm^)  +  i-sin 


(26.13) 


m— 0 


Again  we  can  expand  Eqn.  (26.13)  to  obtain  the  real  and  imaginary 
parts  of  the  reconstructed  signal,  that  is,  the  x/?/-components  of  the 
corresponding  curve  points  gk  =  (xk,yk)  as 


M-l 

xk  =  R e(gk)  =Y^  [Re(Gm)-cos(27rm1|)  -  Im(Gm)-sin(27rm^)] , 

m=°  (26.14) 

M-l 

Vk  =  Im(Sfc)  =  [Im(Gm)-cos(2 ttto^)  +  Re(Gm) -sinpTrm^)] , 

m=0  (26.15) 

for  0  <  k  <  M .  If  all  coefficients  of  the  spectrum  are  used,  this 
reconstruction  is  exact ,  that  is,  the  resulting  discrete  points  gk  are 
identical  to  the  original  contour  points.8 

With  the  aforementioned  formulation  we  can  not  only  reconstruct 
the  discrete  contour  points  gk  from  the  DFT  spectrum,  but  also  a 
smooth,  interpolating  curve  as  the  sum  of  continuous  sine  and  cosine 
components.  To  calculate  arbitrary  points  on  this  curve,  we  replace 
the  discrete  quantity  in  Eqn.  (26.15)  by  the  continuous  parameter  t 
in  the  range  [0, 1).  We  must  be  careful  about  the  frequencies,  though. 
To  achieve  the  desired  smooth  interpolation,  the  set  of  lowest  possible 

8  Apart  from  inaccuracies  caused  by  finite  floating-point  precision. 
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frequencies  urn  must  be  used,9  that  is, 


M—l 

x(t)  =  [R e(Gm)  •  cos(cum  •  t)  —  Im (Gm)  •  sin(u;m  •  t)  , 

m— 0 


M—l 


y(t)  =  [lm(Gm)  •  cos(cum  •  t)  +  Re(Gm)  •  sin(u;m  •  £)] , 


with 


m= 0 

- 


2yrm  for  m  <  (Mf- 2), 

27r (rrt  —  M)  for  rn  >  (Mf-2), 


(26.16) 

(26.17) 

(26.18) 


where  F-  denotes  the  quotient  (i.e.,  integer  division).  Alternatively, 
we  could  write  Eqn.  (26.17)  in  the  form 


Mt  2 


*«) = £  [Re(Gm  mod  m)-cos(27t mt)  -  Im (Gm  mod  M)-sin(27rmt)], 


171  = 


(26.19) 


—  (M— 1)A2 
Ma2 

y(*)  =53  [Im(Gm  mod  M)  •cos(2nmt)  +  Re(Gm  mod  M)  -sin(27rmt)]. 

(26.20) 


m= 

—  (M— 1)  a2 


This  formulation  is  used  for  the  purpose  of  shape  reconstruction  from 
Fourier  descriptors  in  Alg.  26.4. 

Figure  (26.2)  shows  the  reconstruction  of  the  discrete  contour 
points  as  well  as  the  calculation  of  a  continuous  outline  from  the 
DFT  spectrum  obtained  from  a  sequence  of  discrete  contour  posi¬ 
tions.  The  original  sample  points  were  taken  at  M  =  25  uniformly 
spaced  positions  along  the  region’s  contour.  The  discrete  points  in 
Fig.  26.2(b)  are  exactly  reconstructed  from  the  complete  DFT  spec¬ 
trum,  as  specified  in  Eqn.  (26.15).  The  interpolated  (green)  outline 
in  Fig.  26.2(c)  was  calculated  with  Eqn.  (26.15)  for  continuous  posi¬ 
tions,  based  on  the  frequencies  m  =  0, . . . ,  M—  1.  The  oscillations  of 
the  resulting  curve  are  explained  by  the  high-frequency  components. 
Note  that  the  curve  still  passes  exactly  through  each  of  the  original 
sample  points,  in  fact,  these  can  be  perfectly  reconstructed  from  any 
contiguous  range  of  M  coefficients  and  the  corresponding  harmonic 
frequencies.  The  smooth  interpolation  in  Fig.  26.2(d),  based  on  the 
symmetric  low-frequency  coefficients  m  =  —  (M  —  l)-u2,...,M-i-2 
(see  Eqn.  (26.20))  shows  no  such  oscillations,  since  no  high-frequency 
components  are  included. 


26.2.3  Periodicity  of  the  DFT  Spectrum 

When  we  apply  the  DFT,  we  implicitly  assume  that  both  the  signal 
vector  g  =  (g0lgll . . . ,  i)  and  the  spectral  vector  G  =  (G0,Gl5 
•••,Gm-i.)  represent  discrete,  periodic  functions  of  infinite  extent 

9  Due  to  the  periodicity  of  the  discrete  spectrum,  any  summation  over  M 
successive  frequencies  can  be  used  to  reconstruct  the  original  discrete 
x/y  samples.  However,  a  smooth  interpolation  between  the  discrete  x/y 
samples  can  only  be  obtained  from  the  set  of  lowest  frequencies  in  the 
range  [—  ,  F-rfl  centered  around  the  zero  frequency,  as  in  Eqns.  (26.17) 

and  (26.20). 
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(a) 


(c) 


Fig.  26.2 

Contour  reconstruction  by 
inverse  DFT.  Original  im¬ 
age  (a),  M  =  25  uniformly 
spaced  sample  points  on  the 
region’s  contour  (b).  Con¬ 
tinuous  contour  (green  line) 
reconstructed  by  using  frequen¬ 
cies  l om  with  m  =  0,  .  .  .  ,  24 
(c).  Note  that  despite  the  os¬ 
cillations  introduced  by  the 
high  frequencies,  the  contin¬ 
uous  contour  passes  exactly 
through  the  original  sample 
points.  Smooth  interpolation 
reconstructed  with  Eqn.  (26.17) 
from  the  lowest-frequency  coef¬ 
ficients  in  the  symmetric  range 
m  =  -12,  .  .  .  ,  +12  (d). 
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Fig.  26.3 

Applying  the  DFT  to  a 
complex-valued  vector  g  of 
length  M  yields  the  complex¬ 
valued  spectrum  G  that  is  also 
of  length  M .  The  DFT  spec¬ 
trum  is  infinite  and  periodic 
with  M,  thus  G_rn  —  GM_m, 
as  illustrated  by  the  centered 
representation  of  the  DFT 
spectrum  (bottom).  u>  at  the 
bottom  denotes  the  harmonic 
number  (multiple  of  the  funda¬ 
mental  frequency)  associated 
with  each  coefficient. 


(see  [39,  Ch.  13]  for  details).  Due  to  this  periodicity,  G(0)  =  G(M), 
G(l)  =  G(M  +  1),  etc.  In  general, 

G(q  ■  M  +  m)  =  G(rri)  and  G(m)  =  G(rn  mod  M),  (26.21) 

for  arbitrary  integers  g,  m  E  Z.  Also,  since  (— m  mod  M)  =  ( M  —  m ) 
mod  M,  we  can  state  that 

G(—m)  =  G(M-ra),  (26.22) 

for  any  m  E  Z,  such  that  G(— 1)  =  G(M  —  1),  G(— 2)  =  G(M  —  2), 
etc.,  as  illustrated  in  Fig.  26.3. 
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Fig.  26.4 

Truncating  a  DFT  spectrum 
from  M  =  11  to  M'  —  7 
coefficients,  as  specified  in 
Eqns.  (26.23)  and  (26.24). 
Coefficients  G4,  .  .  .  ,  G7  are 
discarded  (AT  4-  2  =  3). 
Note  that  the  associated 
harmonic  number  oo  remains 
the  same  for  each  coefficient. 
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26.2.4  Truncating  the  DFT  Spectrum 


In  the  original  formulation  in  Eqns.  (26.6)-(26.8),  the  DFT  is  applied 
to  a  signal  g  of  length  M  and  yields  a  discrete  Fourier  spectrum  G 
with  M  coefficients.  Thus  the  signal  and  the  spectrum  have  the  same 
length.  For  shape  representation,  it  is  often  useful  to  work  with 
a  truncated  spectrum,  that  is,  a  reduced  number  of  low-frequency 
Fourier  coefficients. 

By  truncating  a  spectrum  we  mean  the  removal  of  coefficients 
above  a  certain  harmonic  number,  which  are  (considering  positive 
and  negative  frequencies)  located  around  the  center  of  the  coefficient 
vector.  Truncating  a  given  spectrum  G  of  length  |G|  =  M  to  a 
shorter  spectrum  G'  of  length  M'  <  M  is  done  as 


G'(m)  <— 


G(m) 


for  0  <  rn  <  M'  -  2, 


G(M  —  M'  +  ra)  for  M'  -f  2  <  m  <  M', 


(26.23) 


or  simply 


G'(m  mod  M')  <—  G(rri  mod  M), 


(26.24) 


for  (MW2— M'+l)  <  rn  <  (M'+ 2).  This  works  for  M  and  M'  being 
even  or  odd.  The  example  in  Fig.  26.4  illustrates  how  an  original 
DFT  spectrum  G  of  length  M  =  11  is  truncated  to  G'  with  only 
M'  =  7  coefficients. 

Of  course  it  is  also  possible  to  calculate  the  truncated  spectrum 
directly  from  the  contour  samples,  without  going  through  the  full 
DFT  spectrum.  With  M  being  the  length  of  the  signal  vector  g  and 
M'  <  M  the  desired  length  of  the  (truncated)  spectrum  G' ,  Eqn. 
(26.6)  modifies  to 


M—l 

G' (rn  mod  M')  =  —  •  gk  •  e“l27rmM  ?  (26.25) 

k=0 

for  m  in  the  same  range  as  in  Eqn.  (26.24).  This  approach  is  more 
efficient  than  truncating  the  complete  spectrum,  since  unneeded  co¬ 
efficients  are  never  calculated.  Algorithm  26.3,  which  is  a  modified 
version  of  Alg.  26.2,  summarizes  the  steps  we  have  described. 

Since  some  of  the  coefficients  are  missing,  it  is  not  possible  to  re¬ 
construct  the  original  signal  vector  g  from  the  truncated  DFT  spec¬ 
trum  G' .  However,  the  calculation  of  a  partial  reconstruction  is  pos¬ 
sible,  for  example,  using  the  formulation  in  Eqn.  (26.20).  In  this 
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1:  FourierDescriptorUniform(g,  M') 

Input:  g  =  (g0, . . . ,  ^m-i)?  a  sequence  of  M  complex  values, 
representing  regularly  sampled  2D  points  along  a  contour  path. 
M' ,  the  number  of  Fourier  coefficients  ( M '  <  M). 

Returns  a  truncated  Fourier  descriptor  G  of  length  M' . 

2:  M  \g\ 

3:  Create  map  G :  [0,  Mx  — 1]  -A  C 

4:  for  m  <-  (M'-r2-M'  +  l), . . . ,  (M'ffi2)  do 

5:  A  0,  0  t>  real/imag.  part  of  coefficient  Gm 

6:  for  k  <—  0,  . . . ,  M  —  1  do 

7:  9<~g(k) 

8:  x  R e(g),  y  <—  Im(g) 

9:  0  2  •  7r  •  m  •  -E 

10:  A  A  +  x  •  cos((/>)  +  y  •  sin(0)  >  Eq.  26.10 

11:  B  <—  B  —  x  •  sin(0)  +  y  •  cos (0) 

12:  G(m  mod  Mx)  •  (A  +  i  •  B) 

13:  return  G. 


26.3  Geometric 
Interpretation  of 
Fourier  Coefficients 

Alg.  26.3 

Calculating  a  truncated 
Fourier  descriptor  for  a  se¬ 
quence  of  uniformly  sampled 
contour  points  (adapted  from 
Alg.  26.2).  The  M  complex¬ 
valued  contour  points  in  g 
represent  2D  positions  sampled 
uniformly  along  the  contour 
path.  The  resulting  Fourier 
descriptor  G  contains  only  M' 
coefficients  for  the  M'  lowest 
harmonic  frequencies. 


case,  the  discarded  (high-frequency)  coefficients  are  simply  assumed 
to  have  zero  values  (see  Sec.  26.3.6  for  more  details). 


26.3  Geometric  Interpretation  of  Fourier 
Coefficients 

The  contour  reconstructed  by  the  inverse  transformation  (Eqn.  (26.15)) 
is  the  sum  of  M  terms,  one  for  each  Fourier  coefficient  Gm  =  (Am,  Bm). 
Each  of  these  M  terms  represents  a  particular  2D  shape  in  the  spa¬ 
tial  domain  and  the  original  contour  can  be  obtained  by  point-wise 
addition  of  the  individual  shapes.  So  what  are  the  spatial  shapes 
that  correspond  to  the  individual  Fourier  coefficients? 


26.3.1  Coefficient  G0  Corresponds  to  the  Contour’s 
Centroid 


We  first  look  only  at  the  specific  Fourier  coefficient  G0  with  frequency 
index  m  =  0.  Substituting  m  =  0  and  ay,  =  0  in  Eqn.  (26.10),  we  get 
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M 


M- 1 


E 

k=0 


•  cos(0)  +  yk  •  sin(0) 


1 
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M- 1 
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k=0 


•  1  +  2/fc  •  0 


1 

M 
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•  cos(0) 


xk  •  sin(0) 
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E 

k=0 


•  1 


xk  •  0 


1 

M 


M-l 

E  yk 

k= 0 


X, 


y- 


(26.26) 

(26.27) 

(26.28) 
(26.29) 


Thus  G0  =  (A0,B0)  =  (x,y)  is  simply  the  average  of  the  x/y- 
coordinates,  that  is,  the  centroid  of  the  original  contour  points  gk  (see 
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G 


G-i 

G0 

G, 

g2 

Fig.  26.5 

DFT  coefficient  G0  cor¬ 
responds  to  the  centroid 
of  the  contour  points. 


I 

\ 


\  G0  =  x  +  i  •  y 
I  nr  A  \ 


Fig.  26. 5). 10  If  we  apply  the  inverse  Fourier  transform  (Eqn.  (26.15)) 
by  ignoring  (i.e.,  zeroing)  all  coefficients  except  G0,  we  get  the  partial 

reconstruction 11  of  the  2D  contour  coordinates  as 


r(0)  . 


A0  •  cos (cjq -j^ )  -  B0  •  sin(w01g) 
x  •  cos(O)  —  y  •  sin(O)  =  x  •  1  —  y  •  0 


B0  •  cos (uoji)  +  Ao  '  sin(wo  w). 
y  •  cos(O)  +  x  •  sin(O)  =  y  •  1  +  x  •  0 


x, 


y- 


(26.30) 

(26.31) 

(26.32) 

(26.33) 


Thus  the  contribution  of  the  spectral  value  G0  is  the  centroid  of  the 
reconstructed  shape  (see  Fig.  26.5).  If  we  perform  a  partial  recon¬ 
struction  of  the  contour  using  only  the  spectral  coefficient  G0,  then 
all  contour  points 


=  9  m- i  =  (x,y) 


(26.34) 


would  have  the  same  (centroid)  coordinate.  This  is  because  G0  is 
the  coefficient  for  the  zero  frequency  and  thus  the  sine  and  cosine 
terms  in  Eqns.  (26.27)  and  (26.29)  are  constant.  Alternatively,  if  we 
reconstruct  the  signal  by  omitting  G0  (i.e.,  g(lr",M"1)),  the  resulting 
contour  is  identical  to  the  original  shape,  except  that  it  is  centered 
at  the  coordinate  origin. 


26.3.2  Coefficient  G±  Corresponds  to  a  Circle 

Next,  we  look  at  the  geometric  interpretation  of  Gx  =  (A1?  £q),  that 
is,  the  coefficient  with  frequency  index  m  —  1,  which  corresponds  to 
the  angular  frequency  c u1  =27 r.  Assuming  that  all  coefficients  Grn  in 
the  DFT  spectrum  are  set  to  zero,  except  the  single  coefficient  G1? 

10  Note  that  the  centroid  of  a  boundary  is  generally  not  the  same  as  the 
centroid  of  the  enclosed  region. 

11  We  use  the  notation  g ^  =  (g^,  g[m\  . . . ,  g^M-i)  f°r  Partial  recon¬ 
struction  of  the  contour  g  from  only  a  single  Fourier  coefficient  Gm.  For 
example,  g ^  is  the  reconstruction  from  the  zero-frequency  coefficient 
G0  only.  Analogously,  we  use  g(a,h,c">  to  denote  a  partial  reconstruction 
based  on  selected  Fourier  coefficients  Gai  Gb,Gc. 
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we  get  the  partially  reconstructed  contour  points  g ^  by  Eqn.  (26.11) 
as 


—  [A1  +  i  •  Bi 


cos(27r-^ )  +  i  •  sin(27T^)], 


(26.35) 

(26.36) 
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Fourier  Coefficients 


for  0  <  k  <  M.  Remember  that  the  complex  values  of  elip  describe 
a  unit  circle  in  the  complex  plane  that  performs  one  full  (counter¬ 
clockwise)  revolution,  as  the  angle  <p  runs  from  0, . . . ,  27 r.  Analo¬ 
gously,  el27Tt  also  describes  a  complete  unit  circle  as  t  goes  from  0 
to  1.  Since  the  term  jj  (for  0  <  k  <  M)  also  varies  from  0  to  1 
in  Eqn.  (26.36),  the  M  reconstructed  contour  points  are  placed  on  a 
circle  at  equal  angular  steps.  Multiplying  e1>27rt  by  a  complex  factor 
z  stretches  the  radius  of  the  circle  by  \z\,  and  also  changes  the  phase 
(starting  angle)  of  the  circle  by  an  angle  0,  that  is, 


2)  •  e 


i -r 


•  e 


i-O+G 


(26.37) 


with  0  =  <z  =  arg (z)  =  tan  1  (Im(z)/Re(z)). 

We  now  see  that  the  points  =  G1  .el27rfe/M,  generated  by  Eqn. 
(26.36),  are  positioned  uniformly  on  a  circle  with  radius  r1  =  \Gi\ 
and  starting  angle  (phase) 


0 


l 


(26.38) 


This  point  sequence  is  traversed  in  counter-clockwise  direction  for 
k  =  0,  ...,M  —  1  at  frequency  m  =  1,  that  is,  the  circle  performs 
one  full  revolution  while  the  contour  is  traversed  once.  The  circle 
is  centered  at  the  coordinate  origin  (0,0),  its  radius  is  |G]_|,  and  its 
starting  point  (Eqn.  (26.36)  for  k  =  0)  is 


=  G 


gi'27r m-i 


=  G 


e^-M 


=  G 


,0 


=  G 


i) 


(26.39) 


as  illustrated  in  Fig.  26.6. 


26.3.3  Coefficient  G m  Corresponds  to  a  Circle  with 
Frequency  m 


Based  on  the  aforementioned  result  for  the  frequency  index  m  —  1, 
we  can  easily  generalize  the  geometric  interpretation  of  Fourier  coef¬ 
ficients  with  arbitrary  index  m  >  0.  Using  Eqn.  (26.11),  the  partial 
reconstruction  for  the  single  Fourier  coefficient  Grn  =  (Am^Brn)  is 
the  contour  g with  coordinates 


=  G  .ei,27rm-i& 

w  rri  ° 


[Am  T  i  *  Brn 


cos(27 Trrijf)  +  i  •  sin(27rm-^) 


M 


(26.40) 

(26.41) 


which  again  describe  a  circle  with  radius  rm  =  |Gm|,  phase  0m  = 

arg(Gm)  =  tan-1  (5m/Am),  and  starting  point  g ^  =  Gm.  In  this 
case,  however,  the  angular  velocity  is  scaled  by  m,  that  is,  the  re¬ 
sulting  circle  revolves  m  times  faster  than  the  circle  for  Gx.  In  other 
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G 


Fig.  26.6 

A  single  DFT  coefficient  cor¬ 
responds  to  a  circle.  The  par¬ 
tial  reconstruction  from  the 
single  DFT  coefficient  Gm 
yields  a  sequence  of  M  points 

g o  •>••••>  gM- 1  on  a  clrcle 

centered  at  the  coordinate 
origin,  with  radius  rm  and 
starting  angle  (phase)  6rn. 


words,  while  the  contour  is  traversed  once,  this  circle  performs  m  full 
revolutions. 

Note  that  G0  (see  Sec.  26.3.1)  does  not  really  constitute  a  special 
case  at  all.  Formally,  it  also  describes  a  circle  but  one  that  oscil¬ 
lates  with  zero  frequency,  that  is,  all  points  have  the  same  (constant) 
position 

g £0)  =  G0  •  =  G0  ■  e1-2*0-*  =  G0  •  e°  =  G0,  (26.42) 

for  k  =  0, . . . ,  M— 1,  which  is  equivalent  to  the  curve’s  centroid  Gq  = 
(x,y),  as  shown  in  Eqns.  (26.27)-(26.29).  Since  the  corresponding 
frequency  is  zero,  the  point  never  moves  away  from  G0. 

26.3.4  Negative  Frequencies 

The  DFT  spectrum  is  periodic  and  defined  for  all  frequencies  m  E  Z, 
including  negative  frequencies.  From  Eqn.  (26.21)  we  know  that  for 
any  DFT  coefficient  with  negative  index  G_m  there  is  an  equivalent 
coefficient  Gn  whose  index  n  is  in  the  range  0, . . . ,  M—  1.  The  partial 
reconstruction  of  the  spectrum  with  the  single  coefficient  G_m  is 

g{~m)  =  G_m  •  =  Qn  •  (26.43) 

with  n  =  —m  mod  M,  which  is  again  a  sequence  of  points  on  the 
circle  with  radius  r_m  =  rn  =  \Gn\  and  phase  9_m  =  6n  =  arg(Gn). 
The  absolute  rotation  frequency  is  m,  but  this  circle  spins  in  the  op¬ 
posite,  that  is,  clockwise  direction,  since  angles  become  increasingly 
negative  with  growing  k. 

26.3.5  Fourier  Descriptor  Pairs  Correspond  to  Ellipses 

It  follows  therefore  that  the  space-domain  circles  for  the  Fourier  co¬ 
efficients  Gm  and  G_m  rotate  with  the  same  absolute  frequency  m 
but  with  different  phase  angles  9rnl0_rn  and  in  opposite  directions. 
We  denote  the  tuple 
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FP 


m 


—  mi 


the  “Fourier  descriptor  pair”  (or  “FD  pair”)  for  the  frequency  index 
m.  If  we  perform  a  partial  reconstruction  from  only  the  two  Fourier 
coefficients  G_m ,  G+m  of  this  FD  pair,  we  obtain  the  spatial  points 


(dbm) 

9  k 


(  —  m)  .  (+m) 

=  9k  +  9  k 

=  G  ■  e_i’27rm'w  +  G 

=  G  •  +  G 

w  — m  °  1  w  m 


•  k 

.  ’  M  . 


(26.44) 


By  Eqn.  (26.15)  we  can  expand  the  result  from  Eqn.  (26.44)  to  carte¬ 
sian  x/y  coordinates  as12 

4±m)  =  A—m  •  cos(— cam  •  jj)  -  B_rn  •  sin(— ujm  •  + 

•  cos(cam  •  yy)  -  Bm  •  sin(cjm  •  ^)  (26.45) 

(A  —  m~^~  Am)  *  COs(cJm  •  yy  )  T  (B _ rn  Bm  )  •  sin(cJm  •  yy  ) , 

4±m)  =  •  cos(-cjm  •  +  A_m  •  sin(— cam  •  + 

•  cos(cam  •  yy)  +  Am  •  sin (cam  •  (26.46) 

— rn  ~b -^rn )  *  COs(c <Jm  •  yy  )  (^4_m  A-rri)  *  sin(cJm  •  yy), 

for  k  =  0, . . . ,  M  —  1.  The  2D  point  sequence  =  (g<^±m), . . . , 

g\\4-i ) 5  obtained  with  Eqns.  (26.45)  and  (26.46),  describes  an  ori¬ 
ented  ellipse  that  is  centered  at  the  origin  (see  Fig.  26.7).  The  para¬ 
metric  equation  for  this  ellipse  is 

X-f.  (yA_rn-\- Am)  *  cos(a;m-t)  T  (yB_rn  Brn )  •  sin(cam T), 

=  (A_rn-\-Am)  •  cos(27 Tint)  +  {B_rn  —  Bm )  •  sin(27rmt),  (26.47) 

Vt  m  T Bm)  '  cos(cjm*t)  ( A_rn  A^^j  •  sin(a;m  -t) 

=  (F>_m  +  F>m)  •  cos(27rmt)  —  (A_rn  —  Arn)  •  sin(27rra£),  (26.48) 


for  £  =  0, . . . ,  1. 


Ellipse  parameters 

In  general,  the  parametric  equation  of  an  ellipse  with  radii  a,  5,  cen¬ 
tered  at  (xc,  yc)  and  oriented  at  an  angle  a  is 


x(0)  =  xc  +  a  •  cos(0)  •  cos(a)  —  b  •  sin(0)  •  sin(a), 
7/(0)  =  yc-\-  a  •  cos(0)  •  sin(a)  +  b  •  sin(0)  •  cos(a), 


(26.49) 


with  '0  =  0,...,  27 r.  From  Eqns.  (26.45)  and  (26.46)  we  see  that  the 
parameters  arnibmi  am  of  the  ellipse  for  a  single  Fourier  descriptor 
pair  FPm  =  (G_m,G+m)  are 


12 


a 


m 


r 


■m 


_L  r  —  \  n 

1  '  +m  I w  —m 


T  |G 


+m 


b  = 

urri 


r 


—  m 


r 


+m 


|G_ 


rn 


IG 


+m 


a 


m 


2  '  (<G_m  +  <G+m) 


(9 


—  m 


(9 


1 

2 


+  rn 


tan-V^Dtan-1^^"1 


4 


—  m 


4 


+m 


(26.50) 

(26.51) 


(26.52) 


Using  the  relations  sin(— a)  =  —  sin (a)  and  cos(— a)  =  cos(a) 


26.3  Geometric 
Interpretation  of 
Fourier  Coefficients 


677 


26  Fourier  Shape 
Descriptors 


Fig.  26.7 

DFT  coefficients  G _m,  GJrrn 
form  a  Fourier  descriptor 
pair  FPm.  Each  of  the  two 
descriptors  corresponds  to 
M  points  on  a  circle  of  ra¬ 
dius  r_m ,  r+m  and  phase 
0 _m,  respectively,  revolv¬ 

ing  with  the  same  frequency 
m  but  in  opposite  directions. 
The  sum  of  each  point  pair 
is  located  on  an  ellipse  with 
radii  arn,brn  and  orientation 
cxrn.  The  orientation  am  of 
the  ellipse’s  major  axis  is  cen¬ 
tered  between  the  starting 
angles  of  the  circles  defined  by 
G_rn  and  G_|_m;  its  radii  are 
am  =  r-m  Jrr+m  f°r  the  major 
axis  and  bm  =  | r_rn  —  rJf_rn  | 
for  the  minor  axis.  The  figure 
shows  the  situation  for  m  =  1. 


Like  its  constituting  circles,  this  ellipse  is  centered  at  (xc,yc)  =  (o,o) 
and  performs  m  revolutions  for  one  traversal  of  the  contour.  G_m 
specifies  the  circle 

z-m(<p)  =  G_m  ■  =  r_m  ■  (26.53) 

for  ip  G  [0,27 r],  with  starting  angle  0_rn  and  radius  r_m,  rotating  in 
a  clockwise  direction.  Similarly,  specifies  the  circle 

z+mbp)  =  G+m  •  e 1,(v)  =  r+m  ■  eI'(~e+m+v\  (26.54) 

with  starting  angle  0+m  and  radius  r+m,  rotating  in  a  counter-clockwise 
direction.  Both  circles  thus  rotate  at  the  same  angular  velocity 
but  in  opposite  directions,  as  mentioned  before.  The  corresponding 
(complex-valued)  ellipse  points  are 


m(V^)  T  \-miSP)  • 


(26.55) 


The  ellipse  radius  \zm((f)  \  is  a  maximum  at  position  Lp  =  (^max,  where 
the  angles  on  both  circles  are  identical  (i.e.,  the  corresponding  vectors 
have  the  same  direction).  This  occurs  when 


6 


—m 


max 


max 


or 


max 


1 

A  '  (^— m  0 -\-m)  5 


that  is,  at  mid-angle  between  the  two  starting  angles  0_rn  and  0+m, 
Therefore,  the  orientation  of  the  ellipse’s  major  axis  is 

arn  =  ® -\-m  d  ^  ^ ~m  +  $+m)  7 
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2 


(26.56) 
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Fig.  26.8 

Ellipse  created  by  partial 
reconstruction  from  a  sin¬ 
gle  Fourier  descriptor  pair 

F P rrh  (^G —  m 5  •  The 

two  complex-valued  Fourier  co¬ 
efficients  G_rn  =  (  —  2,0.5)  and 
Gm  =  (0.4,  1.6)  represent  cir¬ 
cles  with  starting  points  G_m 
and  G_|_m,  respectively.  The 
circle  for  G_m  (red)  rotates 
in  clockwise  direction,  the  cir¬ 
cle  for  Gjrrn  (blue)  rotates  in 
counter-clockwise  direction. 

The  ellipse  (green)  is  the  result 
of  point-wise  addition  of  the 
two  circles,  as  shown  for  four 
successive  points,  starting  with 
point  G'_mTG_|_m. 


as  already  stated  in  Eqn.  (26.52).  At  ip  =  ^max  the  two  radial  vectors 
align,  and  thus  the  radius  of  the  ellipse’s  major  axis  am  is  the  sum 
of  the  two  circle  radii,  that  is, 


UJrri  ^ — m  T  T_|_m 


(26.57) 


(cf.  Eqn.  (26.50)).  Analogously,  the  ellipse  radius  is  minimized  at  po¬ 
sition  ip  —  (/?min,  where  the  z_m((pmin)  and  z+rn((pm in)  lie  on  opposite 
sides  of  the  circle.  This  occurs  at  angle 

‘Fmin  V^max  T  “  ^  (26.58) 

and  the  corresponding  radius  for  the  ellipse’s  minor  axis  is  (cf.  Eqn. 
(26.51)) 


^rn  T -\-m  T — rn  *  (26.59) 

Figure  26.8  illustrates  this  situation  for  a  specific  Fourier  descriptor 
pair  FPm  =  (G_m,G+m)  =  (-2  +  i  •  0.5,  0.4  +  i  •  1.6).  Note  that  the 
ellipse  parameters  am,  6m,  am  (see  Eqns.  (26.50)-(26.52))  are  not  ex¬ 
plicitly  required  for  reconstructing  (drawing)  the  contour,  since  the 
ellipse  can  also  be  generated  by  simply  adding  the  ^/^-coordinates 
of  the  two  counter-revolving  circles  for  the  participating  Fourier  de¬ 
scriptors,  as  given  in  Eqn.  (26.55).  Another  example  is  shown  in  Fig. 
26.9. 

26.3.6  Shape  Reconstruction  from  Truncated  Fourier 
Descriptors 

Due  to  the  periodicity  of  the  DFT  spectrum,  the  complete  recon¬ 
struction  of  the  contour  points  gk  from  the  Fourier  coefficients  Gm 
(see  Eqn.  (26.11))  could  also  be  written  with  a  different  summation 
range,  as  long  as  all  spectral  coefficients  are  included,  that  is, 
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Fig.  26.9 

Partial  reconstruction  from 
single  coefficients  and  an  FD 
descriptor  pair.  The  two  cir¬ 
cles  reconstructed  from  DFT 
coefficient  G  _x  (a)  and  coef¬ 
ficient  Gjrl  (b)  are  positioned 
at  the  centroid  of  the  contour 
( G0 ).  The  combined  recon¬ 
struction  for  pro¬ 

duces  the  ellipse  in  (c).  The 
dots  on  the  green  curves  show 
the  path  position  for  t  =  0. 


09 


(a) 


(b) 


(c) 


M  —  l  m0+M- 1 

9k=YJGm-  ei  2?rm'*  =  E  ■  ei'2nm'™ -  (26.60) 

m— 0  m=m0 


for  any  start  index  m0  G  Z.  As  a  special  (though  important)  case 
we  can  perform  the  summation  symmetrically  around  the  zero  index 
and  write 


M—l 


Mt  2 


9k 


^  G  -m 


ei-27rm-^-  _ 


m— 0 


(M-1)t2 


(26.61) 


To  understand  the  reconstruction  in  terms  of  Fourier  descriptor  pairs, 
it  is  helpful  to  distinguish  if  M  (the  number  of  contour  points  and 
Fourier  coefficients)  is  even  or  odd. 


Odd  number  of  contour  points 

If  M  is  odd,  then  the  spectrum  consists  of  G0  (representing  the  con¬ 
tour’s  centroid)  plus  exactly  M  F-  2  Fourier  descriptor  pairs  FPm, 
with  m  =  1, . . . ,  M  F-  2. 13  We  can  thus  rewrite  Eqn.  (26.60)  as 


M—l 


Mt  2 


9k  —  ^  Grn  •  e1  2?rm  M  —  G0  +  [G_ 


— i-27rm-TT 


m 


T71 


m=0 


Mt2 


(o) 

4 


rri—1 


(i m)  (  —  m)  .  (m) 


=  4°}  +  E  9^=  ^  +  9\tl)  +  5fe±2)  +  •  •  •  +  4±M  A  (26.62) 


m=l 


where  denotes  the  partial  reconstruction  from  the  single  Fourier 

descriptor  pair  FPm  (see  Eqn.  (26.44)). 

As  we  already  know,  the  partial  reconstruction  of  an  in¬ 

dividual  Fourier  descriptor  pair  FPm  is  a  set  of  points  on  an  ellipse 
that  is  centered  at  the  origin  (0,0).  The  partial  reconstruction  of 
the  three  DFT  coefficients  G0,  G_m,  G+m  (i.e.,  FPm  plus  the  single 
coefficient  G0)  is  the  point  sequence 

9  k  =  9k  +  9  k  >  (26.63) 

which  is  the  ellipse  for  g[±?T^  shifted  to  =  (x,y),  the  centroid  of 
the  original  contour.  For  example,  the  partial  reconstruction  from 
the  coefficients  G_1?  G0,  G+1, 
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13 


If  M  is  odd,  then  M  —  2  •  (M  F  2)  +  1. 


(-1,0,1)  _  _  (0)  (±1) 
yk  ~  "k  —  -h-,  ^  yk 


(26.64) 


yields  an  ellipse  with  frequency  m  —  1  that  revolves  around  the 
(fixed)  centroid  of  the  original  contour.  If  we  add  another  Fourier 
descriptor  pair  FP2,  the  resulting  reconstruction  is 
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(-2,. ..,2)  (0)  .  (±1)  .  (±2) 

9  k  =  9k  +9yk  +  9l 


ellipse  1 


ellipse  2 


(26.65) 


The  resulting  ellipse  has  the  frequency  m  =  2,  but  note  that  it 

is  centered  at  a  moving  point  on  the  “slower”  ellipse  (with  frequency 
mn  =  1),  that  is,  ellipse  2  effectively  “rides”  on  ellipse  1.  If  we  add  FP3, 
its  ellipse  is  again  centered  at  a  point  on  ellipse  2,  and  so  on.  For  an 
illustration,  see  the  examples  in  Figs.  26.11  and  26.12.  In  general,  the 
ellipse  for  descriptor  pair  FP7  revolves  around  the  (moving)  center 
obtained  as  the  superposition  of  j  —  1  “slower”  ellipses, 

9k]  +  E  4±m)-  (26-66) 

m—1 

Consequently,  the  curve  obtained  by  the  partial  reconstruction  from 
descriptor  pairs  FP1? . . . ,  FPy  (for  j  <  M  -y  2)  is  the  point  sequence 


9k  =  9k]  +  E  9k±m\  (26-67) 

m= 1 

for  k  =  0, . . . ,  M  —  1.  The  fully  reconstructed  shape  is  the  sum  of 
the  centroid  (defined  by  G0)  and  M-r- 2  ellipses,  one  for  each  Fourier 
descriptor  pair  FP1? . . . ,  FPM_^2- 


Even  number  of  contour  points 

If  M  is  even,14  then  the  reconstructed  shape  is  a  superposition  of 
the  centroid  (defined  by  C0),  (M  —  1)  -y  2  ellipses  from  the  Fourier 
descriptor  pairs  FPX, . . . ,  FP plus  one  additional  circle  spec¬ 
ified  by  the  single  (highest  frequency)  Fourier  coefficient  The 

complete  reconstruction  from  an  even-length  Fourier  descriptor  can 
thus  be  written  as 


M  —  1 

9k=  EG--ei'2™^ 

m— 0 


(M— 1)T2 

sT  +  E  +  (26-68) 

center  ^  m—1  >  1  circle 

V” 

(M— 1)t2  ellipses 


The  single  high-frequency  circle  associated  with  ^  has  its  (mov¬ 
ing)  center  at  the  sum  of  all  lower-frequency  ellipses  that  correspond 
to  the  Fourier  coefficients  C_m, . . . ,  C+m,  with  m  <  (M  -y  2). 


Reconstruction  algorithm 

Algorithm  26.4  describes  the  reconstruction  of  shapes  from  a  Fourier 
descriptor  using  only  a  specified  number  (Mp)  of  Fourier  descriptor 
pairs.  The  number  of  points  on  the  reconstructed  contour  (TV)  can 
be  freely  chosen. 


14 


In  this  case,  M  —  2  •  (M  ~  2)  =  (M  —  1)  -y2  +  l  +  M-i-2  . 
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Fig.  26.10 

Partial  shape  reconstruction 
from  a  limited  set  of  Fourier 
descriptor  pairs.  The  full  de¬ 
scriptor  contains  125  coeffi¬ 
cients  (G0  plus  62  FD  pairs). 


26.3.7  Fourier  Descriptors  from  Unsampled  Polygons 

The  requirement  to  distribute  sample  points  uniformly  along  the  con¬ 
tour  path  stems  from  classical  signal  processing  and  Fourier  the¬ 
ory,  where  uniform  sampling  is  a  common  assumption.  However, 
as  shown  in  [143]  (see  also  [183,262]),  the  Fourier  descriptors  for  a 
polygonal  shape  can  be  calculated  directly  from  the  original  polygon 
vertices  without  sub-sampling  the  contour.  This  “trigonometric”  ap¬ 
proach,  described  in  the  following,  works  for  arbitrary  (convex  and 
non-convex)  polygons. 

We  assume  that  the  shape  is  specified  as  a  sequence  of  P  points 
V  =  (u0,...,  vP_ i),  with  V(i)  =  vi  —  (xpi/i)  representing  the  2D 
vertices  of  a  closed  polygon.  We  define  the  quantities 

d(i)  =  v{i+1)modP-vi  and  X(i)  =  ||d(*)|| ,  (26.69) 

for  i  =  0, . . . ,  P—1,  where  d(i)  is  the  vector  representing  the  polygon 
^£2  segment  between  the  vertices  tq,iq+1,  and  A (i)  is  the  length  of  that 
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(c)  3  pairs,  t  =  0.3 


segment.  We  also  define 


i—i 


3=  0 


(26.70) 


Fig.  26.11 

Partial  reconstruction  by  el¬ 
lipse  superposition  (details). 
The  green  curve  shows  the 
partial  reconstruction  from 
1,  .  .  .  ,  9  FD  pairs.  This  curve 
performs  one  full  revolution 
as  the  path  parameter  t  runs 
from  0  to  1.  Subfigures  (a— i) 
depict  the  situation  for  1,  .  .  .  ,  9 
FD  pairs  and  different  path 
positions  t  =  0.1,  0.2,  .  .  .  ,  0.9. 
Each  Fourier  descriptor  pair 
corresponds  to  an  ellipse  that 
is  centered  at  the  current  posi¬ 
tion  t  on  the  previous  ellipse. 
The  individual  Fourier  descrip¬ 
tor  pair  FPX  in  (a)  corresponds 
to  a  single  ellipse.  In  (b),  the 
point  for  t  =  0.2  on  the  blue 
ellipse  (for  FP^  is  the  center 
of  the  red  ellipse  (for  FP2).  In 
(c),  the  green  ellipse  (for  FP3) 
is  centered  at  the  point  marked 
on  the  previous  ellipse,  and  so 
on.  The  reconstructed  shape 
is  obtained  by  superposition  of 
all  ellipses.  See  Fig.  26.12  for  a 
detailed  view. 


Fig.  26.12 

Partial  reconstruction  by  el¬ 
lipse  superposition  (details). 
The  green  curve  shows  the  par¬ 
tial  reconstruction  from  5  FD 
pairs  FP1?  .  .  .  ,  FP5.  This  curve 
performs  one  full  revolution 
as  the  path  parameter  t  runs 
from  0  to  1.  Subfigures  (a— c) 
show  the  composition  of  the 
contour  by  superposition  of  the 
5  ellipses,  each  corresponding 
to  one  FD  pair,  at  selected 
positions  t  =  0.0,  0.1,  0.2. 

The  blue  ellipse  corresponds 
to  FPX  and  revolves  once  for 
t  =  0,  .  .  .  ,  1.  The  blue  dot 
on  this  ellipse  marks  the  po¬ 
sition  t ,  which  serves  as  the 
center  of  the  next  (red)  ellipse 
corresponding  to  FP2.  This 
ellipse  makes  2  revolutions  for 
t  =  0,  .  .  .  ,  1  and  the  red  dot  for 
position  t  is  again  the  center 
of  green  ellipse  (for  FP3),  and 
so  on.  Position  t  on  the  orange 
ellipse  (for  FPX)  coincides  with 
the  final  reconstruction  (green 
curve).  The  original  contour 
was  sampled  at  125  equidistant 
points. 


for  i  =  0, . . . ,  P,  which  is  the  cumulative  length  of  the  polygon  path 
from  the  start  vertex  v0  to  vertex  vi,  such  that  L(0)  is  zero  and  L(P) 
is  the  closed  path  length  of  the  polygon  V. 
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Alg.  26.4 

Partial  shape  reconstruction 
from  a  truncated  Fourier  de¬ 
scriptor  G.  The  shape  is  re¬ 
constructed  by  considering 
up  to  Mp  Fourier  descriptor 
pairs.  The  resulting  sequence 
of  contour  points  may  be  of 
arbitrary  length  ( N ).  See  Figs. 
26.10—26.12  for  examples. 


1:  GetPartialReconstruction(G,  Mp,  N) 

Input:  G  =  (Go, . . .  ,Gm-i),  Fourier  descriptor  with  M  coeffi¬ 
cients;  Mp,  number  of  Fourier  descriptor  pairs  to  consider;  N , 
number  of  points  on  the  reconstructed  shape.  Returns  the  recon¬ 
structed  contour  as  a  sequence  of  N  complex  values. 

2:  Create  map  g :  [0,  -ZV —  1]  -A  C 

3:  M^\G\  >  total  number  of  Fourier  coefficients 

4:  Mp  min(Mp,  (M  —  l)d-2)  >  available  Fourier  coefficient  pairs 

5:  for  k  4 —  0, . . . ,  N—  1  do 

6:  t  <—  k/N  t>  continuous  path  position  t  £  [0, 1] 

7:  g(k)  <—  GetSinglePoint (G,  — Mp,Mp,£)  >  see  below 

8:  return  g. 

9:  GetSinglePoint(G,  rn_ ,  m+ ,  t) 

Returns  a  single  point  (as  a  complex  value)  on  the  reconstructed 
shape  for  the  continuous  path  position  t  £  [0,1],  based  on  the 
Fourier  coefficients  G(ra_), . . . ,  G(ra+). 

10:  M<-\G\ 

11:  x  <—  0,  y  <—  0 

12:  for  rn  rn_ , . . . ,  rn+  do 

13:  <j)  2  •  7r  •  m  •  t 

14:  G  G(rn  mod  M) 

15:  A^Re(G),  B<-Im(G) 

16:  x  <—  x  +  A  •  cos(0)  —  B  •  sin(</>) 

17:  y  y  +  A  •  sin(0)  +  B  •  cos(0) 

18:  return  (x  +  i  2/)- 


For  a  (freely  chosen)  number  of  Fourier  descriptor  pairs  (Mp),  the 
corresponding  Fourier  descriptor  G  =  ( G_M  , . . . ,  G0, . . . ,  G+M  ), 
has  2Mp  +  1  complex- valued  coefficients  Gm,  where 

G0  =  a0  +  i  •  c0  (26.71) 

and  the  remaining  coefficients  are  calculated  as 

-\-m  T  ^  "  (pm  ^m)'i  (26.72) 

G  —  rn  (^m  ^m)  T  i  *  (26.73) 

from  the  “trigonometric  coefficients”  amibrni  cm,  dm.  As  described 
in  [143],  these  coefficients  are  obtained  directly  from  the  P  polygon 
vertices  vi  as 


CLn  \  .  i=0 

“]  =  »„+  — 


+  *(')•£  d(j)  -  <*(•)•£  Mj) 


3=0 


3=  0 


L(P) 


(26.74) 


(representing  the  shape’s  center),  with  d,  A,  L  as  defined  in  Eqns. 
(26.69)  and  (26.70).  This  can  be  simplified  to 


—  vo  + 


PE  -  Ub) ■  d(b  +  Ah  •  fa  -  ««)] 

2  =  0 


A(i) 
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a0 

c0 


L(P) 


(26.75) 


1: 


2: 

3: 

4: 

5: 

6: 

7: 

8: 

9: 

10: 

11: 


FourierDescriptorFromPolygon(V,  Mp) 

Input:  V  =  (v0,  a  sequence  of  P  points  representing 

the  vertices  of  a  closed  2D  polygon;  Mp,  the  desired  number  of 
FD  pairs.  Returns  a  new  Fourier  descriptor  of  length  2Mp  +  l. 

P  |V|  t>  number  of  polygon  vertices  in  V 

M  2  •  Mp  +  1  >  number  of  Fourier  coefficients  in  G 

Create  maps  d:  [0,  P—  1]  — »•  M2,  A:  [0,  P  —  1]  — R, 

L:  [0 ,P]  -A  M,  G:  [0,M-1]  C 


L(0)  <-  0 

for  i  0, . . . ,  P  —  1  do 

d(i)  <-  V((i  +  1)  mod  P)  -  V(i) 
A (*)  <-  ||d(i)|| 

L(i  +  1)  4 —  L(i)  +  \(i) 


>  Eq.  26.69 


D>  CL  —  CLq  ,  C  —  Cq 


12: 

13: 

14: 

15: 

16: 

17: 

18: 

19: 

20: 

21: 

22: 

23: 

24: 

25: 


for  i  0, . . . ,  P  —  1  do 

L2(i+1)-L2(i) 

2-X  (i) 

a  '  /aA  +  S  •  d(i)  +  A(j)  •  (C(i)  -  1/(0)) 


s 


L(*) 


c 


c 


G(0)  <-  V0  +  pipy  • 

for  rn  <—  1 , . . . ,  Mp  do 


a 


0 


$ 


0 


>  Eq.  26.75 

>  Eq.  26.71 
>  for  FD-pairs  G±i, . . . ,  G±m 

0  CLrn  5  5  i  d 


for  i  «<—  0, . . . ,  P—  1  do 
cc0  27rm  • 


aq  27rm 
a\  /a 


L((i+1)  mod  P) 


C 

s 


<e- 


c 


+ 


L(P) 

cos  (a;  2  )  —  cos(a;Q  ) 

Mi) 


d{i) 


G(m) 


m  +  sin(q-yH>  .  d(i) 

,  L{P) 


G  (—rn  mod  M) 


return  G. 


(2n  m)2 

L(P) 

(2'Km)2 


cl  d 
c  —  b 

a  —  d 
c  +  b 


m 


>  Eq.  26.76 

>  Eq.  26.77 

>  Eq.  26.72 
D>  Eq.  26.73 
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Alg.  26.5 

Fourier  descriptor  from 
trigonometric  data  (arbi¬ 
trary  polygons).  Parameter 
Mp  specifies  the  number  of 
Fourier  coefficient  pairs. 


The  remaining  coefficients  am,  bm,  cm,  dm  (m  =  1, . . . ,  Mp)  are  cal¬ 
culated  as 


(26.76) 

d(i)\, 

(26.77) 

respectively.  The  complete  calculation  of  a  Fourier  descriptor  from 
trigonometric  coordinates  (i.e.,  from  arbitrary  polygons)  is  summa¬ 
rized  in  Alg.  26.5. 

An  approximate  reconstruction  of  the  original  shape  can  be  ob¬ 
tained  directly  from  the  trigonometric  coefficients  am,  6m,  cm,  dm  de- 


a 

c 


m  \  _ 


L(P )  G  r cos(2m^)  “  cos(27rm^E) 


m 


(2'Km)- 


Et 

2=0 


A(i) 


d 


m  \  _ 


m 


L(P ) 

(2irm)‘ 


-  sin(27rm^) 


2  =  0 


A  (*) 
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Fig.  26.13 

Fourier  descriptors  cal¬ 
culated  from  trigonomet¬ 
ric  data  (arbitrary  poly¬ 
gons).  Shape  reconstructions 
with  different  numbers  of 
Fourier  descriptor  pairs  (M  ). 


fined  in  Eqns.  (26.75)  and  (26.76)  as15 


m=l 


•cos(27t  mi)  + 


•sin(27rmi)  , 


(26.78) 


for  t  =  0, . . . ,  1.  Of  course,  this  reconstruction  can  also  be  calculated 
from  the  actual  DFT  coefficients  G,  as  described  in  Eqn.  (26.20). 
Again  the  reconstruction  error  is  reduced  by  increasing  the  number 
of  Fourier  descriptor  pairs  (Mp),  as  demonstrated  in  Fig.  26. 13. 16 
The  reconstruction  is  theoretically  perfect  as  Mp  goes  to  infinity. 

Working  with  the  trigonometric  technique  is  an  advantage,  in  par¬ 
ticular,  if  the  boundary  curvature  along  the  outline  varies  strongly. 
For  example,  the  silhouette  of  a  human  hand  typically  exhibits  high 
curvature  along  the  fingertips  while  other  contour  sections  are  almost 
straight.  Capturing  the  high-curvature  parts  requires  a  significantly 
higher  density  of  samples  than  in  the  smooth  sections,  as  illustrated 
in  Fig.  26.14.  This  figure  compares  the  partial  shape  reconstruc¬ 
tions  obtained  from  Fourier  descriptors  calculated  with  uniform  and 
non-uniform  contour  sampling,  using  identical  numbers  of  Fourier 
descriptor  pairs  (Mp).  Note  that  the  coefficients  (and  thus  the  re¬ 
constructions)  are  very  similar,  although  considerably  fewer  samples 
were  used  for  the  trigonometric  approach. 


15  Note  the  analogy  to  the  elliptical  reconstruction  in  Eqns.  (26.47)  and 
(26.48). 

Most  test  images  used  in  this  chapter  were  taken  from  the  Kimia  dataset 
[134].  A  selected  subset  of  modified  images  taken  from  this  dataset  is 
available  on  the  book’s  website. 
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Fig.  26.14 

Fourier  descriptors  from 
uniformly  sampled  vs.  non- 
uniformly  sampled  (trigono¬ 
metric)  contours.  Partial 
constructions  from  Fourier 
descriptors  obtained  from  uni¬ 
formly  sampled  contours  (rows 
1,  3)  and  non-uniformly  sam¬ 
pled  contours  (rows  2,  4),  for 
different  numbers  of  Fourier 
descriptor  pairs  (M  ). 


26.4  Effects  of  Geometric  Transformations 

To  be  useful  for  comparing  shapes,  a  representation  should  be  invari¬ 
ant  against  a  certain  set  of  geometric  transformations.  Typically,  a 
minimal  requirement  for  robust  2D  shape  matching  is  invariance  to 
translation,  scale  changes,  and  rotation.  Fourier  shape  descriptors 
in  their  basic  form  are  not  invariant  under  any  of  these  transforma¬ 
tions  but  they  can  be  modified  to  satisfy  these  requirements.  In  this 
section,  we  discuss  the  effects  of  such  transformations  upon  the  corre¬ 
sponding  Fourier  descriptors.  The  steps  involved  for  making  Fourier 
descriptors  invariant  are  discussed  subsequently  in  Sec.  26.5. 

26.4.1  Translation 

As  described  in  Sec.  26.3.1,  the  coefficient  G0  of  a  Fourier  descriptor 
G  corresponds  to  the  centroid  of  the  encoded  contour.  Moving  the 


26  Fourier  Shape  points  gk  of  a  shape  g  in  the  complex  plane  by  some  constant  zGC, 
Descriptors 

9k=  9k  +  (26.79) 

for  k  =  0, . . . ,  M  —  1,  only  affects  Fourier  coefficient  G0,  that  is, 


Grn-\-z  for  m  —  0, 
Grn  for  m/0. 


(26.80) 


To  make  an  FD  invariant  against  translation,  it  is  thus  sufficient  to 
zero  its  G0  coefficient,  thereby  shifting  the  shape’s  center  to  the  origin 
of  the  coordinate  system.  Alternatively,  translation  invariant  match¬ 
ing  of  Fourier  descriptors  is  achieved  by  simply  ignoring  coefficient 

G0. 


26.4.2  Scale  Change 

Since  the  Fourier  transform  is  a  linear  operation,  scaling  a  2D  shape 
g  uniformly  by  a  real- valued  factor  s, 

9k  =  s‘9k >  (26.81) 

also  scales  the  corresponding  Fourier  spectrum  by  the  same  factor, 
that  is, 


r <r  _  .  n 


(26.82) 


for  m  =  1, . . . ,  M  —  1.  Note  that  scaling  by  s  =  —1  (or  any  other 
negative  factor)  corresponds  to  reversing  the  ordering  of  the  samples 
along  the  contour  (see  also  Sec.  26.4.6).  Given  the  fact  that  the 
DFT  coefficient  Gi  represents  a  circle  whose  radius  r1  =  \G1 


is 


proportional  to  the  size  of  the  original  shape  (see  Sec.  26.3.2),  the 
Fourier  descriptor  G  could  be  normalized  for  scale  by  setting 


^rn 


l 


I G 


■G 


m  i 


(26.83) 


for  m  —  1, . . . ,  M  —  1,  such  that  Gf  =  1.  Although  it  is  common  to 
use  only  for  scale  normalization,  this  coefficient  may  be  relatively 
small  (and  thus  unreliable)  for  certain  shapes.  We  therefore  prefer 
to  normalize  the  complete  Fourier  coefficient  vector  to  achieve  scale 
invariance  (see  Sec.  26.5.1). 


26.4.3  Rotation 


If  a  given  shape  is  rotated  about  the  origin  by  some  angle  /?,  then 
each  contour  point  vk  =  (xk,yk)  moves  to  a  new  position 


f x'k\  =  f C0SU)  -sin(/3)\  _  / xk 
Gk)  \sin(/3)  cos  (P)J  \yk 


(26.84) 


If  the  2D  contour  samples  are  represented  as  complex  values  gk  = 
xk  T  i  ■  Vk')  this  rotation  can  be  expressed  as  a  multiplication 
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9k  =  e1/3  -gk, 


(26.85) 


with  the  complex  factor  e1/5  =  cos (/?)  +  i  •  sin(/3).  As  in  Eqn.  (26.82), 
we  can  use  the  linearity  of  the  DFT  to  predict  the  effects  of  rotating 
the  shape  g  by  angle  [3  as 
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=  ei/3 


G 


m  i 


(26.86) 


for  m  =  0,  —  1.  Thus,  the  spatial  rotation  in  Eqn.  (26.85) 

multiplies  each  DFT  coefficient  Gm  by  the  same  complex  factor  e1/5, 
which  has  unit  magnitude.  Since 


.  n  —  PK&m+P)  .  \n 

°  °  |  ^  m 


(26.87) 


this  only  rotates  the  phase  0rn  =  <xGm  of  each  coefficient  by  the  same 
angle  /?,  without  changing  its  magnitude  \Gm  . 


26.4.4  Shifting  the  Sampling  Start  Position 

Despite  the  implicit  periodicity  of  the  boundary  sequence  and  the 
corresponding  DFT  spectrum,  Fourier  descriptors  are  generally  not 
the  same  if  sampling  starts  at  different  positions  along  the  con¬ 
tour.  Given  a  periodic  sequence  of  M  discrete  contour  samples 
9  =  {do,  •  •  -,9m- i),  we  select  another  sequence  g'  =  (g'0,  g[ , . . .)  = 
(gk  ,9ka+ 1,  •  •  •)>  again  of  length  M,  from  the  same  set  of  samples  but 
starting  at  point  fcs,  that  is, 


9k  9{k-\-ks )  mod  M  •  (26.88) 

This  is  equivalent  to  shifting  the  original  signal  g  circularly  by  —  ks 
positions.  The  well-known  “shift  property”  of  the  Fourier  transform17 
states  that  such  a  change  to  the  “signal”  g  modifies  the  corresponding 
DFT  coefficients  Gm  (for  the  original  contour  sequence)  to 


=  e 


i -m- 


27rG 

M 


=  e 


G 

^  rn  i 


(26.89) 


where  (ps  =  is  a  constant  phase  angle  that  is  obviously  propor¬ 
tional  to  the  chosen  start  position  ks.  Note  that,  in  Eqn.  (26.89), 
each  DFT  coefficient  Gm  is  multiplied  by  a  different  complex  quan¬ 
tity  ei  m '^s,  which  is  of  unit  magnitude  and  varies  with  the  frequency 
index  m.  In  other  words,  the  magnitude  of  any  DFT  coefficient  Gm  is 
again  preserved  but  its  phase  changes  individually.  The  coefficients 
of  any  Fourier  descriptor  pair  FPm  =  (G_m,  G+m)  thus  become 

G'_m  =  .  G_m  and  G'+m  =  ei,m^  •  G+m,  (26.90) 

that  is,  coefficient  G_rn  is  rotated  by  the  angle  —  m  •  <ps  and  is 
rotated  by  m-<ps.  In  other  words,  a  circular  shift  of  the  signal  by  —ks 
samples  rotates  the  coefficients  G_m,  G+m  by  the  same  angle  m  •  <ps 
but  in  opposite  directions.  Therefore,  the  sum  of  both  angles  stays 
the  same,  that  is, 


<G_rn  +  <XG_ |_m  <G_rn  +  <G+m.  (26.91) 


17 


See  Chapter  18,  Sec.  18.1.6. 
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Start  at  0% 


Start  at  5% 


Start  at  10% 


Fig.  26.15 

Effects  of  choosing  different 
start  points  for  contour  sam¬ 
pling.  The  start  point  (marked 
X  on  the  contour)  is  set  to 
0%,  5%,  10%  of  the  contour 
path  length.  The  blue  and 
green  circles  represent  the 
partial  reconstruction  from 
single  DFT  coefficients  G_1 
and  Gjrl)  respectively.  The 
dot  on  each  circle  and  the  as¬ 
sociated  radial  line  shows  the 
phase  of  the  corresponding 
coefficient.  The  black  line  in¬ 
dicates  the  average  orientation 
(<G_1  +  <G_(_1)/2.  It  can  be 
seen  that  the  phase  difference 
of  G_1  and  G+1  is  directly  re¬ 
lated  to  the  start  position,  but 
the  average  orientation  (black 
line)  remains  unchanged. 


In  particular,  we  see  from  Eqn.  (26.90)  that  shifting  the  start  position 
modifies  the  coefficients  of  the  first  descriptor  pair  FP1  =  (G_i,  G+1) 
to 

G'_!  =  e_i^s-  G_!  and  G'+i  =  e ^  G+1.  (26.92) 

The  resulting  absolute  phase  change  of  the  coefficients  G_l7G+1  is 
— (ps,+(^s,  respectively,  and  thus  the  change  in  phase  difference  is 
2  •  <^s,  that  is,  the  phase  difference  between  the  coefficients  G_1?  G+1 
is  proportional  to  the  chosen  start  position  ks  (see  Fig.  26.15). 

26.4.5  Effects  of  Phase  Removal 

As  described  in  the  two  previous  sections,  shape  rotation  (Sec.  26.4.3) 
and  shift  of  start  point  (Sec.  26.4.4)  both  affect  the  phase  of  the 
Fourier  coefficients  but  not  their  magnitude.  The  fact  that  magni¬ 
tude  is  preserved  suggests  a  simple  solution  for  rotation  invariant 
shape  matching  by  simply  ignoring  the  phase  of  the  coefficients  and 
comparing  only  their  magnitude  (see  Sec.  26.6).  Although  this  comes 
at  the  price  of  losing  shape  descriptiveness,  magnitude-only  descrip¬ 
tors  are  often  used  for  shape  matching.  Clearly,  the  original  shape 
cannot  be  reconstructed  from  a  magnitude-only  Fourier  descriptor, 
as  demonstrated  in  Fig.  26.16.  It  shows  the  reconstruction  of  shapes 
from  Fourier  descriptors  with  the  phase  of  all  coefficients  set  to  zero, 
except  for  G_1?  G0  and  G+1  (to  preserve  the  shape’s  center  and  main 
orientation) . 


Original  Fourier  descriptors 
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26.4  Effects  of 

Geometric 

Transformations 


Fig.  26.16 

Effects  of  removing  phase  in¬ 
formation.  Original  shapes  and 
reconstruction  after  phase  re¬ 
moval  (a— c).  Original  Fourier 
coefficients  (d— f)  and  zero- 
phase  coefficients  (g— i).  The 
red  and  green  plots  in  (d— i) 
show  the  real  and  imaginary 
components,  respectively;  gray 
plots  show  the  coefficient  mag¬ 
nitude.  Dark-shaded  bars  cor¬ 
respond  to  the  actual  values, 
light-shaded  bars  are  logarith¬ 
mic  values.  The  magnitude  of 
the  coefficients  in  (d— f)  is  the 
same  as  in  (g— i). 


26.4.6  Direction  of  Contour  Traversal 


If  the  traversal  direction  of  the  contour  samples  is  reversed,  the  co¬ 
efficients  of  all  Fourier  descriptor  pairs  are  exchanged,  that  is, 


—  m  mod  M  ’ 


(26.93) 


This  is  equivalent  to  scaling  the  original  shape  by  s  =  —1,  as  pointed 
out  in  Section  26.4.2.  However,  this  is  typically  of  no  relevance  in 
matching,  since  we  can  specify  all  contours  to  be  sampled  in  either 
clockwise  or  counter-clockwise  direction. 


26.4.7  Reflection  (Symmetry) 

Mirroring  or  reflecting  a  contour  about  the  x-axis  is  equivalent  to 
replacing  each  complex- valued  point  gk  =  xk  +  i  •  yk  by  its  complex 
conjugate  gk,  that  is, 

g'k  =  9*k  =  xk  -  i  -Vk-  (26.94) 

This  change  to  the  “signal”  results  in  a  modified  DFT  spectrum  with 
coefficients 


G'm  —  G 


* 


■m  mod  M  5 


(26.95) 
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Table  26.1 

Effects  of  spatial  transfor¬ 
mations  upon  the  corre¬ 
sponding  DFT  spectrum. 
The  original  contour  sam¬ 
ples  are  denoted  gk ,  the 
DFT  coefficients  are  Gm. 


Operation 

Contour  samples 

DFT  coefficients 

M  — 1 

1  W  — i27rm-j r-r 

=  -kr2^ak-z  M 

Forward  transformation 

9k 

for  k  =  0,  .  .  .  ,  M  —  1 

C1 

m-i 

=  V  G  •  ei27rm  m 

k  —  0 

Inverse  transformation 

9k 

c 

for  rn  =  (),...,  M  —  1 

m  =  0 

Translation  (by  z  G  C) 

9k 

J  G  m  T  2  for  m  —  0 

=  9k  +  2 

Ci/ 

vjr- m 

otherwise 

Uniform  scaling  (by  sGl) 

9k 

—  s  '  9k 

C1' 

^  m 

=  s  •  Gm 

Rotation  about  the  origin 

9k 

=  e1-/F  gk 

C1' 

—  e1  13  •  Gm 

(by  4) 

2n  fc 

Shift  of  start  position  (by  ks) 

9k 

9(k-\-ks )  mod  M 

C1' 

m 

=  e1  m  M  . a 

m 

Direction  of  contour 
traversal 

9k 

9—k  mod  M 

C1' 

-  c 

KJT —  m  mod  M 

Reflection  about  the  a>axis 

9k 

* 

=  9k 

C1' 

_  i  * 

*“r  — m,  mod  M 

where  G*  denotes  the  complex  conjugate  of  the  original  DFT  co¬ 
efficients.  Reflections  about  arbitrary  axes  can  be  described  in  the 
same  way  with  additional  rotations.  Fourier  descriptors  can  be  made 
invariant  against  reflections,  such  that  symmetric  contours  map  to 
equivalent  descriptors  [245].  Note,  however,  that  invariance  to  sym¬ 
metry  is  not  always  desirable,  for  example,  for  distinguishing  the 
silhouettes  of  left  and  right  hands. 

The  relations  between  2D  point  coordinates  and  the  Fourier  spec¬ 
trum,  as  well  as  the  effects  of  the  aforementioned  geometric  shape 
transformations  upon  the  DFT  coefficients  are  compactly  summa¬ 
rized  in  Table  26.1. 


26.5  Transformation-Invariant  Fourier  Descriptors 

As  mentioned  already,  making  a  Fourier  descriptor  invariant  to  trans¬ 
lation  or  absolute  shape  position  is  easy  because  the  only  affected 
spectral  coefficient  is  G0.  Thus,  setting  coefficient  G0  to  zero  implic¬ 
itly  moves  the  center  of  the  corresponding  shape  to  the  coordinate 
origin  and  thus  creates  a  descriptor  that  is  invariant  to  shape  trans¬ 
lation. 

Invariance  against  a  change  in  scale  is  also  a  simple  issue  because 
it  only  multiplies  the  magnitude  of  all  Fourier  coefficients  by  the  same 
real-valued  scale  factor,  which  can  be  easily  normalized. 

A  more  challenging  task  is  to  make  Fourier  descriptors  invariant 
against  shape  rotation  and  shift  of  the  contour  starting  point ,  because 
they  jointly  affect  the  phase  of  the  Fourier  coefficients.  If  matching 
is  to  be  based  on  the  complex-valued  Fourier  descriptors  (not  on  co¬ 
efficient  magnitude  only)  to  achieve  better  shape  discrimination,  the 
phase  changes  introduced  by  shape  rotation  and  start  point  shifts 
must  be  eliminated  first.  However,  due  to  noise  and  possible  ambi¬ 
guities,  this  is  not  a  trivial  problem  (see  also  [183, 184, 189,245]). 
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26.5.1  Scale  Invariance 

As  mentioned  in  Section  26.4.2,  the  magnitude  G+1  is  often  used  as 
a  reference  to  normalize  for  scale,  since  G+1  is  typically  (though  not 
always)  the  Fourier  coefficient  with  the  largest  magnitude.  Alter¬ 
natively,  one  could  use  the  size  of  the  fundamental  ellipse,  defined 
by  the  Fourier  descriptor  pair  FPX,  to  measure  the  overall  scale,  for 
example,  by  normalizing  to 


26.5  Transformation- 
Invariant  Fourier 
Descriptors 


(26.96) 


which  normalizes  the  length  of  the  major  axis  a1  =  \G_1\  +  | G+1  |  (see 
Eqn.  (26.57))  of  the  fundamental  ellipse  to  unity.  Another  alternative 
is 


(26.97) 


which  normalizes  the  area  of  the  fundamental  ellipse.  Since  all  vari¬ 
ants  in  Eqns.  (26.83),  (26.96)  and  (26.97)  scale  the  coefficients  Gm 
by  a  fixed  (real- valued)  factor,  the  shape  information  contained  in 
the  Fourier  descriptor  remains  unchanged. 

There  are  shapes,  however,  where  coefficients  G+1  and/or  G_1 
are  small  or  almost  vanish  to  zero,  such  that  they  are  not  always 
a  reliable  reference  for  scale.  An  obvious  solution  is  to  include  the 
complete  set  of  Fourier  coefficients  by  standardizing  the  norm  of  the 
coefficient  vector  G  to  unity  in  the  form 

Gfn  < - -pg-  •  Grn ,  (26.98) 


(assuming  that  G0  =0).  In  general,  the  L2  norm  of  a  complex- valued 
vector  E  =  (z0,  z1, . . . ,  zM_1 ),  zi  £  C,  is  defined  as 


M—l 

E 


Za 


4/2 


M—l 


E  ReU)2 + imU): 


>1/2 


(26.99) 


i— 1  i—1 

Scaling  the  vector  E  by  the  reciprocal  of  its  norm  yields  a  vector  with 
unit  norm,  that  is, 


1 

E 


•  E 


(26.100) 


To  normalize  a  given  Fourier  descriptor  G,  we  use  all  elements  except 
G0  (which  relates  to  the  absolute  position  of  the  shape  and  is  not  rel¬ 
evant  for  its  shape).  The  following  substitution  makes  G  scale  invari¬ 
ant  by  normalizing  the  remaining  sub- vector  (Gl5  G2, . . . ,  Gm_i)  to 


<— 


G 


m 


4--G 


m 


for  rn  =  0, 
for  1  <  rn  <  M, 


M—l 


with  v  =  E  \Gm\2.  (26.101) 


m—l 


See  procedure  MakeScalelnvariant(G)  in  Alg.  26.6  (lines  7-15)  for  a 
summary  of  this  step. 
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26.5.2  Start  Point  Invariance 


As  discussed  in  Sections  26.4.3  and  26.4.4,  respectively,  shape  rota¬ 
tion  and  shift  of  start  point  both  affect  the  phase  of  the  Fourier  coef¬ 
ficients  in  a  combined  manner,  without  altering  their  magnitude.  In 
particular,  if  the  shape  is  rotated  by  some  angle  (3  (see  Eqn.  (26.89)) 
and  the  start  position  is  shifted  by  ks  samples  (see  Eqn.  (26.86)), 
then  each  Fourier  coefficient  Gm  is  modified  to 

G'm  =  e i'/?  •  •  Gm  =  •  Gm,  (26.102) 


where  ps  =  2nks/M  is  the  corresponding  start  point  phase.  Thus, 
the  incurred  phase  shift  is  not  only  different  for  each  coefficient  but 
simultaneously  depends  on  the  rotation  angle  [3  and  the  start  point 
phase  ips.  Normalization  in  this  case  means  to  remove  these  phase 
shifts,  which  would  be  straightforward  if  [3  and  <ps  were  known.  We 
derive  these  two  parameters  one  after  the  other,  starting  with  the  cal¬ 
culation  of  the  start  point  phase  <^s,  which  we  describe  in  this  section, 
followed  by  the  estimation  of  the  rotation  /3,  shown  subsequently  in 
Section  26.5.3. 

To  normalize  the  Fourier  descriptor  of  a  particular  shape  to  a 
“canonical”  start  point,  we  need  a  quantity  that  can  be  calculated 
from  the  Fourier  spectrum  and  only  depends  on  the  start  point  phase 
(ps  but  is  independent  of  the  rotation  (3.  From  Eqn.  (26.90)  and  Fig. 
26.15  we  see  that  the  phase  difference  within  any  Fourier  descrip¬ 
tor  pair  (G_m,  G+m)  is  proportional  to  the  start  point  phase  <ps  and 
independent  to  shape  rotation  /?,  since  the  latter  rotates  all  coeffi¬ 
cients  by  the  same  angle.  Thus,  we  look  for  a  quantity  that  depends 
only  on  the  phase  differences  within  Fourier  descriptor  pairs.  This  is 
accomplished,  for  example,  by  the  function 


M, 


p 


AM  =  F  [« 


G. 


■m 


,i -m-ip 


■G 


m 


(26.103) 


rri—1 


where  parameter  Lp  is  an  arbitrary  start  point  phase,  Mp  is  the  num¬ 
ber  of  coefficient  pairs,  and  (g)  denotes  the  “cross  product”  between 
two  Fourier  coefficients.18  Given  a  particular  start  point  phase  <£>, 
the  function  in  Eqn.  (26.103)  yields  the  sum  of  the  cross  products 
of  each  coefficient  pair  (G_m,Gm),  for  m  =  1  ,...,Mp.  If  each  of 
the  complex-valued  coefficients  is  interpreted  as  a  vector  in  the  2D 
plane,  the  magnitude  of  their  cross  product  is  proportional  to  the 
area  of  the  enclosed  parallelogram.  The  enclosed  area  is  potentially 
large  only  if  both  vectors  are  of  significant  length,  which  means  that 
the  corresponding  ellipse  has  a  distinct  eccentricity  and  orientation. 
Note  that  the  sign  of  the  cross  product  may  be  positive  or  negative 
and  depends  on  the  relative  orientation  or  “handedness”  of  the  two 
vectors. 

Since  the  function  fp(<p)  is  based  only  on  the  relative  orientation 
(phase)  of  the  involved  coefficients,  it  is  invariant  to  a  shape  rotation 

18  In  analogy  to  2D  vector  notation,  we  define  the  “cross  product”  of  two 
complex  quantities  z1  —  (cq,  G)  and  ^2  —  (&2>  G)  as  G  z2  =  a1-b2  — 
b1-a2  =  \zf  •  | z2 1  •  sin(#2  —  Of).  See  also  Sec.  B.3.3  in  the  Appendix. 
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(b)  rotation  6  —  15°,  start  point  phase  (ps  =  0° 
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Table  26.2 

Plot  of  the  function  fp(<p) 
used  for  start  point  normal¬ 
ization.  In  the  figures  on  the 
left,  the  real  start  point  is 
marked  by  a  black  dot.  The 
normalized  start  points  and 
+  7 r  are  marked  by  a 
blue  and  a  brown  cross,  respec¬ 
tively.  They  correspond  to  the 
two  peak  positions  of  the  func¬ 
tion  /  (</?),  as  defined  in  Eqn. 
(26.103),  separated  by  a  fixed 
phase  shift  of  7 r  =  180°  (right). 
The  function  is  invariant  under 
shape  rotation,  as  demon¬ 
strated  in  (b),  where  the  shape 
is  rotated  by  15°  but  sampled 
from  the  same  start  point  as 
in  (a).  However,  the  phase  of 
is  proportional  to  the 
start  point  shift,  as  shown  in 
(c),  where  the  start  point  is 
chosen  at  25%  {(f  s  =  90°)  of 
the  boundary  path  length.  The 
functions  were  calculated  af¬ 
ter  scale  normalization,  using 
Mp  =  25  Fourier  coefficient 
pairs. 


/?,  which  shifts  all  coefficients  by  the  same  angle  (see  Eqn.  (26.86)). 
As  shown  in  Fig.  26.2,  fp((p)  is  periodic  with  7 r  and  its  phase  is 
proportional  to  the  actual  start  point  shift.  We  choose  the  angle  ip 
that  maximizes  fp(<p)  as  the  “canonical”  start  point  phase  (pA,  that  is, 

ip  a  =  argmax/p  ((/?).  (26.104) 

0<c^<7 r 

However,  since  fp(<p)  =  fp(<p  +  tt) ,  there  is  also  a  second  candidate 
phase 


(£b  =  (^a  +  7F  (26.105) 

displaced  by  7 r  =  180°.  The  two  “canonical”  start  points  correspond¬ 
ing  to  (pA  and  (pB,  respectively,  are  marked  on  the  reconstructed 
shapes  in  Fig.  26.2.  Although  it  might  seem  easy  at  first  to  resolve 
this  180°  ambiguity  of  the  start  point  phase,  this  turns  out  to  be  diffi¬ 
cult  to  achieve  in  general  from  the  Fourier  coefficients  alone.  Several 
functions  have  been  proposed  for  this  purpose  that  work  well  for  cer¬ 
tain  shapes  but  fail  on  others,  including  the  “positive  real  energy” 
function  suggested  in  [245].  In  particular,  any  decision  based  on  the 
magnitude  or  phase  of  a  single  coefficient  (or  a  single  coefficient  pair) 
must  eventually  fail,  since  none  of  the  coefficients  is  guaranteed  to 
have  a  significant  magnitude.  With  vanishing  coefficient  magnitude, 


26  Fourier  Shape  phase  measurements  become  unreliable  and  may  be  very  susceptible 
Descriptors  to  noise. 

The  complete  process  of  start  point  normalization  is  summarized 
in  Alg.  26.7.  The  start  point  phase  <pA  is  found  numerically  by  eval¬ 
uating  the  function  fp(ip)  at  400  discrete  steps  for  ip  =  0, . . . ,  tt  (lines 
6-16).  For  practical  use,  this  exhaustive  method  should  be  substi¬ 
tuted  by  a  more  efficient  and  accurate  optimization  technique  (for 
example,  using  Brent’s  method  [190,  Ch.  10]). 19  Given  the  estimated 
start  point  phase  <pA  for  the  Fourier  descriptor  G,  two  normalized 
versions  GA,  GB  are  calculated  as 


GA : 
Gb  : 


(26.106) 


for  m  =  —  Mp,...,Mp,m  ^  0.  Note  that  start  point  normaliza¬ 
tion  does  not  require  the  Fourier  descriptor  G  to  be  normalized  for 
translation  and  scale  (see  Sec.  26.5.1). 


26.5.3  Rotation  Invariance 

After  normalizing  for  starting  point,  the  orientation  of  the  funda¬ 
mental  ellipse  (formed  by  the  descriptor  pair  (Gf_1,G+1))  could  be 
assumed  to  be  a  reliable  reference  for  global  shape  rotation.  However, 
for  certain  shapes  (e.g.,  regular  polyhedra  with  an  even  number  of 
faces),  G_i  may  vanish.  Therefore,  we  recover  the  overall  shape  ori¬ 
entation  from  the  vector  obtained  as  the  weighted  sum  of  all  Fourier 
coefficients,  that  is, 


z  —  ^  —  •  ( G_m  +  G+m),  (26.107) 

m— 1 

where  the  1/m  serves  as  a  weighting  factor,  giving  stronger  empha¬ 
sis  to  the  low-frequency  coefficients  and  attenuating  the  influence  of 
the  high-frequency  coefficients.  The  resulting  shape  orientation  esti¬ 
mate  is 


(3  =  <z  =  tan-Ah/d).  (26. 

\Re(z)J 

To  normalize  GA,GB  (obtained  in  Eqn.  (26.106))  for  shape  orienta¬ 
tion,  we  rotate  each  coefficient  (except  G 0)  by  —  /?,  that  is, 


G 
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A 


Qa  Qa  .  e-'l'P 

w  rn  A  w  rn  ° 


B 


qB  .  g 

w  rn  '  w  r  n  ° 


iB 


—  i*/5 


(26.109) 


for  m  =  —  Mp, . . . ,  Mp,  m  ^  0.  For  a  summary  of  these  steps,  see 
procedure  MakeRotationlnvariant(G)  in  Alg.  26.6  (lines  16-24). 
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The  accompanying  Java  implementation  uses  the  class  Brent  Optimizer 
from  the  Apache  Commons  Math  library  [4]  for  this  purpose. 


1:  Makelnvariant(G) 

Input:  G,  Fourier  descriptor  with  Mp  coefficient  pairs. 

Returns  a  pair  of  normalized  Fourier  descriptors  GA,  GB,  with 
a  start  point  phase  offset  by  180°. 

2:  MakeScalelnvariant(G)  >  see  below 

3:  (GA,  Gb)  <—  MakeStartPointlnvariant(G)  t>  see  Alg.  26.7 

4:  MakeRotationlnvariant(GA)  >  see  below 

5:  MakeRotationlnvariant(GB) 

6:  return  (GA,  GB). 

7:  MakeScalelnvariant(G) 

Modffies  G  by  unifying  its  norm  and  returns  the  scale  factor  v . 
8:  s  V-  0 

9:  for  m  <—  1, ,  Mp  do 

10:  s  V-  s  +  |G(— ra)|2  +  |G(ra)|2 

11:  v  V-  1  /  yfs 

12:  for  m  V-  1, . . . ,  Mp  do 

13:  G(—m)  v  ■  G(—m) 

14:  G(m)  V-  v  •  G(m) 

15:  return  v. 

16:  MakeRotationlnvariant(G) 

Modffies  G  and  returns  the  estimated  rotation  angle  /3. 

17:  z  <—  0  +  i-0  t>  z  G  C 

18:  for  m  V-  1, . . . ,  Mp  do 

19:  z<-z+A-(G(  — m)  +  G(m))  >  complex  addition! 

20:  f3  V-  <2: 

21:  for  m  <—  1, . . . ,  Mp  do  D>  rotate  all  coefficients  by  —  f3 

22:  G(—m)  <—  e_i'^  •  G(—m) 

23:  G(m)  <—  e-1'^  •  G(m) 

24:  return  /?. 


26.5  Transformation- 
Invariant  Fourier 
Descriptors 

Alg.  26.6 

Making  Fourier  descriptors 
invariant  against  scale,  shift 
of  start  point,  and  shape  ro¬ 
tation.  For  a  given  Fourier 
descriptor  G,  procedure 
MakeStartPointlnvariant(G) 
returns  a  pair  of  normalized 
Fourier  descriptors  (GA,GB), 
one  for  each  normalized 
start  point  phase  c pA  and 
Vb  =  Va  +  n- 


26.5.4  Other  Approaches 


The  aforementioned  normalization  for  making  Fourier  descriptors  in¬ 
variant  to  geometric  transformations  deviates  from  the  published 
“classic”  techniques  in  certain  ways,  but  also  adopts  some  common 
elements.  As  representative  examples,  we  briefly  discuss  two  of  these 
techniques  (already  referenced  earlier)  in  the  following. 

Persoon  and  Fu  [183,184]  proposed  (in  what  they  call  the  “subop- 
timal”  approach)  to  choose  the  parameters  s  (common  scale  factor), 
/ 3  (shape  rotation),  and  ips  (start  point  phase)  such  that  the  modified 
coefficients  G'_1,G'_^_1  are  both  imaginary  and  | G_i  +  G+1|  =  1.  As 
argued  in  [245],  this  method  leaves  a  ±180°  ambiguity  for  the  shape 
orientation.  Also,  it  requires  that  both  G_l5G+1  have  significant 
magnitude,  which  may  not  be  true  for  G_1  in  case  of  shapes  that  are 
circularly  symmetric  (e.g.,  equilateral  triangles,  squares,  pentagons 
etc.). 

Wallace  and  Wintz  [245]  use  |G+1|  as  the  common  scale  factor, 
because  the  coefficient  G+1  typically  has  the  largest  magnitude.  The 
phase  of  G+1,  denoted  </>]_  =  <1 G+1,  and  the  phase  of  another  co¬ 
efficient  Gk  (k  >  0)  with  the  second-largest  magnitude  and  phase 
(j)k  =  <X Gk  are  used  to  compensate  for  rotation  and  starting  point. 
Coefficients  are  phase  shifted  such  that  both  G'+1  and  G'k  have  zero 
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Alg.  26.7 

Making  Fourier  descrip¬ 
tors  invariant  to  the  shift 
of  start  point.  Since  the  re¬ 
sult  is  ambiguous  by  180° , 
two  normalized  descriptors 
(GA,GB)  are  returned,  with 
the  start  point  phase  set  to 
(pA  and  (pA  +  7 r,  respectively. 


1:  MakeStartPointlnvariant(G) 

Input:  G ,  Fourier  descriptor  with  Mp  coefficient  pairs. 

Returns  a  pair  of  new  Fourier  descriptors  GA ,  GB,  normalized 
to  the  start  point  phase  <pA  and  <pA  +  7r,  respectively. 

2:  ipA  G-  GetStartPointPhase(G)  >  see  below 

3:  Ga  <r-  ShiftStartPointPhase(G,  ipA)  >  see  below 

4:  Gb  ShiftStartPointPhase(G,  cpA  +  7 r) 

5:  return  (GA,  GB). 


6: 

GetStartPointPhas  e(G) 

Returns  ip  maximizing  fp(G,ip),  with  ip  £  [0,  7r).  The  maximum 
is  found  by  simple  brute- force  search  (for  illustration  only). 

7 

bnax  ^ 

8 

^max  ^  0 

9 

K  <-  400 

>  do  If  search  steps  over  0, . . . ,  7r 

10 

for  k  4 —  0, . . . ,  K  —  1  do 

>  find  (/?  maximizing  fp(G,  </?) 

11 

<P  <-  K  ■  Jc 

12 

c  fp(G,  ip) 

13 

if  c  >  cmax  then 

14 

^max  ^  C 

15 

T^max  ^ 

16 

return  y>max. 

17 

/P  (G,<p) 

>  see  Eq.  26.103 

18 

s  4 —  0 

19 

for  rn  <—  1 , . . . ,  Mp  do 

20 

Zl  <r-  G(—m)  ■ 

21 

z2  <r-  G(m)  •  e"77^ 

22 

s  <—  s  +  Re(;2q)  •  Im(z2) 

—  Im^)  •  Re(z2)  t>  =  s  +  (z1  (g)  z2) 

23 

return  s. 

24 

ShiftStartPointPhase(G,  <p) 

>  start-point  normalize  G  by  p> 

25 

G'  Duplicate(G) 

26 

for  rn  <—  1 , . . . ,  Mp  do 

27 

G’  (—rn)  G(—m)  •  e_ 

i-m-p 

28 

G'\m)  -s-  G(m)  ■  ei  m ' 

29 

return  Gb 

phase.  This  is  accomplished  by  multiplying  all  coefficients  in  the 
form 


G'  =  G  ■  M-[(ra-*O-0  1+(i-m)-<f>k]-(k-i) 

^  m  wm  °  7 


(26.110) 


for 


K  ,  x 
2  ^  1 


<  m  <  (also  used  in  [189]).  Depending  on  the 
index  k  of  the  second-largest  coefficient,  there  exist  \k  —  1|  different 
orientation/start  point  combinations  to  obtain  zero-phase  in  Gf+1  and 
G'k.  If  k  =  2,  then  \k  —  1|  =  1,  thus  the  solution  is  unique  and  Eqn. 
(26.110)  simplifies  to 


G’  —  G  .  D-[(™-2)-b  i+(i-m)-02] 

w  m  w  m  ° 
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(26.111) 


with  <j) 2  =  <G2-  u  Otherwise,  the  ambiguity  is  resolved  by  calculat¬ 
ing  an  “ambiguity-resolving”  criterion  for  each  of  the  \k  — 1|  solutions, 
for  example,  the  amount  of  “positive  real  energy”, 


N—l 

^Re(G^)-|Re(G^)|, 

m—1 


as  defined  in  [245]  (other  functions  were  suggested  in  [189]).  This 
leaves  the  problem  that,  for  matching,  the  normalization  of  the  in¬ 
vestigated  shape  descriptor  must  be  based  on  the  same  set  of  domi¬ 
nant  coefficients  as  the  reference  descriptor.  Alternatively,  one  could 
memorize  the  relevant  coefficient  indexes  for  every  reference  descrip- 

20  Unfortunately,  the  general  use  of  coefficient  G2  as  a  phase  reference  is 
critical,  because  the  magnitude  of  G2  may  be  small  or  even  zero  for 
certain  symmetrical  shapes  (including  all  regular  polygons  with  an  even 
number  of  faces). 


26.5  Transformation- 
Invariant  Fourier 
Descriptors 

Fig.  26.17 

Start  point  normalization  un¬ 
der  varying  shape  rotation  (/3). 
The  real  start  point  (which 
varies  with  shape  rotation)  is 
marked  by  a  black  dot.  The 
two  normalized  start  points 
(pA  and  =  (pA  +  7 r  (cal¬ 
culated  with  the  procedure 
in  Alg.  26.7)  are  marked  by  a 
blue  and  a  brown  X ,  respec¬ 
tively.  Twenty-five  Fourier 
coefficient  pairs  are  used  for 
the  normalization  and  shape 
reconstruction.  Inaccuracies 
are  due  to  shape  variations 
caused  by  the  use  of  nearest- 
neighbor  interpolation  for  the 
image  rotation. 
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Fig.  26.18 

Reconstruction  of  various 
shapes  from  Fourier  descrip¬ 
tors  normalized  for  start  point 
shift  and  shape  rotation.  The 
blue  shapes  (rows  1,3)  cor¬ 
respond  to  the  normalized 
Fourier  descriptors  GA  with 
start  point  phase  <^A.  The 
brown  shapes  (rows  2,4)  cor¬ 
respond  to  the  normalized 
Fourier  descriptors  GB  with 
start  point  phase  cpB  =  ipA  +  7 r. 

No  scale  normalization  was 
applied  for  better  visualization. 


tor,  but  then  different  normalizations  must  be  applied  for  matching 
against  multiple  models  in  a  database. 


26.6  Shape  Matching  with  Fourier  Descriptors 

A  typical  use  of  Fourier  descriptors  is  to  see  if  a  given  shape  is  iden¬ 
tical  or  similar  to  an  exemplar  contained  in  a  database  of  reference 
shapes.  For  this  purpose,  we  need  to  define  a  distance  measure  that 
quantifies  the  difference  between  two  Fourier  shape  descriptors  Gx 
and  G2 .  In  the  following,  we  assume  that  the  Fourier  descriptors 
G?i,G2  are  at  least  scale-normalized  (as  described  in  Alg.  26.6)  and 
of  identical  length,  each  with  Mp  coefficient  pairs. 

26.6.1  Magnitude-Only  Matching 

In  the  simplest  case,  we  only  use  the  magnitude  of  the  Fourier  co¬ 
efficients  for  comparison  and  entirely  ignore  their  phase,  using  the 
distance  function 


M, 


p 


distM(G1,G2)  —  ^(|G1(m)|  —  |G2(m)|) 


M 


p 


m—  —  Mp, 
m^O 


1/2 


(26.112) 


Y  (lGi(“m)l  “  \G2(~m)\)  +  (|Gi(m)|  -  |G2(m)|) 


1 1/2 


m=l 
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where  Mp  denotes  the  number  of  FD  pairs  used  for  matching.  Note  26.6  Shape  Matching 

that  Eqn.  (26.112)  is  simply  the  L2  norm  of  the  magnitude  difference  with  Fourier 

vector,  and  of  course  other  norms  (such  as  L1  or  L^)  could  be  used  Descriptors 

as  well.  The  advantage  of  the  magnitude-only  approach  is  that  no 

normalization  (except  for  scale)  is  required.  Its  drawback  is  that 

even  highly  dissimilar  shapes  might  be  mistakenly  matched,  since 

the  removal  of  phase  naturally  eliminates  shape  information  that  is 

possibly  essential  for  discrimination.  As  demonstrated  in  Fig.  26.19, 

a  given  Fourier  magnitude  vector  may  correspond  to  a  great  diversity 

of  shapes,  and  thus  the  subspace  of  “equivalent”  shapes  defined  by 

the  magnitude-only  distance  distM  is  quite  large. 


Fig.  26.19 

Magnitude-only  reconstruc¬ 
tion  (randomized  phase).  Re¬ 
construction  of  shapes  from 
Fourier  descriptors  with  the 
phase  of  all  coefficients  (except 
G_1,  G0,  and  G +1)  individ¬ 
ually  randomized.  Note  that 
the  magnitude  of  the  coef¬ 
ficients  is  exactly  the  same 
for  each  shape  category,  so 
all  blue  shapes  would  be  con¬ 
sidered  “equivalent”  to  the 
original  shape  (first  column) 
by  a  magnitude-only  matcher. 


Nevertheless,  magnitude-only  matching  may  be  sufficient  in  sit¬ 
uations  where  the  reference  shapes  are  not  too  similar.  In  a  sense, 
the  operation  of  reducing  the  complex-valued  Fourier  descriptors  to 
their  magnitude  vectors  can  be  viewed  as  a  hash  function.  While  po¬ 
tentially  many  different  shapes  may  produce  (i.e.,  “hash  to”)  similar 
Fourier  magnitude  vectors,  the  chance  of  two  real  shapes  mapping  to 
the  same  vector  (and  thus  being  confused)  may  be  relatively  small. 
Thus,  particularly  considering  its  simplicity  (only  scale-normalization 
of  descriptors  is  required),  magnitude-based  matching  can  be  quite 
effective  in  practice. 

Figure  26.20  shows  the  pair-wise  magnitude-only  distances  (blue 
cells,  values  are  10  x  distM)  between  various  sample  shapes.  The 
corresponding  intra-class  distances,  given  in  Fig.  26.21,  are  typically 
more  than  one  order  of  magnitude  smaller,  indicating  that  shape 
discrimination  based  on  this  measure  should  be  fairly  reliable. 


26.6.2  Complex  (Phase-Preserving)  Matching 


Assuming  that  the  Fourier  descriptors  G1  and  G2  have  been  normal¬ 
ized  for  scale,  start  point  shift,  and  shape  rotation  (see  Alg.  26.6), 
we  can  use  the  following  function  to  measure  their  mutual  distance: 
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Fig.  26.20 

Inter-class  Fourier  descriptor 
distances  (magnitude-only  and 
complex- valued).  Numbers 
inside  the  green  fields  (lower- 
left  half  of  the  matrix)  are 
the  magnitude-only  distances 
distM  (see  Eqn.  (26.112)). 
Numbers  in  blue  fields  (upper- 
right  half  of  the  matrix)  are 
the  complex-valued  distances 
distc  (see  Eqn.  (26.114)). 
Shapes  were  sampled  uni¬ 
formly  at  125  contour  posi¬ 
tions,  with  25  coefficient  pairs. 
Fourier  descriptors  were  nor¬ 
malized  for  scale,  start  point 
and  rotation.  All  distance 
values  are  multiplied  by  10. 
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# 
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5.914 
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0.000 

distM(G!1 ,  (x2)  distc(G1,G2) 


dist 


c 


Mp 

(g1;g2)  =  (]T  |g1(to)  —  G2(m)|2  j  2 


m—  —  Mp, 
0 


Mp 

(E  “  G2(~m)\2  +  \Gi(m)  -  G2(m)|2  ) 


m=l 
M, 


p 

=  (E  [Re(G1(m))-Re(G2( 


m 


m=  —  Mp , 
0 


+  [lm(G1(m))  —  Im(G2(m)) 


d/2 


(26.113) 

(26.114) 


(26.115) 


Again,  this  is  simply  the  L2  norm  of  the  complex-valued  difference 
vector  G1  —  G2  (ignoring  the  coefficients  at  m  =  0),  which  could 
be  substituted  by  some  other  norm.  Since  the  phase  of  the  involved 
coefficients  is  fully  preserved,  a  zero  distance  between  two  Fourier 
descriptors  means  that  they  represent  the  very  same  shape.  Thus 
the  set  of  equivalent  shapes  defined  by  the  distance  function  in  Eqn. 
(26.114)  is  much  smaller  than  the  one  defined  by  the  magnitude-only 
distance  in  Eqn.  (26.112).  Consequently,  the  probability  of  two  dif¬ 
ferent  shapes  being  confused  for  the  same  is  also  significantly  smaller 
with  this  distance  measure. 


26.6  Shape  Matching 
with  Fourier 
Descriptors 
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Fig.  26.21 

Intra-class  Fourier  descrip¬ 
tor  distances  (magnitude-only 
and  complex- valued).  The 
reference  images  (0°  column) 
were  rotated  by  angle  a  (mul¬ 
tiples  of  17°),  using  no  (i.e., 
nearest-neighbor)  interpo¬ 
lation.  Numbers  inside  the 
blue  fields  are  the  magnitude- 
only  distances  distM  (see  Eqn. 
(26.112)).  Numbers  inside  the 
green  fields  are  the  complex¬ 
valued  distances  distc  (see 
Eqn.  (26.114)).  Shapes  were 
sampled  uniformly  at  125  con¬ 
tour  positions,  with  25  coeffi¬ 
cient  pairs.  Fourier  descriptors 
were  normalized  for  scale,  start 
point  shift  and  shape  rotation. 
All  distance  values  are  multi¬ 
plied  by  10.  Note  that  all  in¬ 
tra-class  distances  are  roughly 
one  order  of  magnitude  smaller 
than  the  inter- class  distances 
shown  in  Fig.  26.20. 


Complex  inter-class  and  intra-class  distance  values  for  the  set  of 
sample  shapes  are  listed  in  Figs.  26.20  and  26.21.  Notice  that,  with 
the  normalization  described  in  Alg.  26.6,  the  complex  intra-class  dis¬ 
tance  values  in  Fig.  26.21  (which  should  be  as  small  as  possible)  are 
typically  about  twice  as  large  as  the  corresponding  magnitude-only 
distance  values,  but  still  an  order  of  magnitude  smaller  than  compa¬ 
rable  inter-class  values  in  Fig.  26.20,  so  reliable  shape  discrimination 
should  be  possible. 

The  price  paid  for  the  increased  discriminative  power  is  the  extra 
work  necessary  for  normalizing  the  Fourier  descriptors  for  start  point 
and  shape  rotation  (in  addition  to  scale),  as  described  in  Alg.  26.6. 
Note  that  this  involves  the  comparison  with  two  normalized  descrip¬ 
tors  to  cope  with  the  unresolved  180°  ambiguity  of  the  start  point 
normalization  (see  Eqns.  (26.104)  and  (26.105)).  For  example,  as¬ 
sume  we  wish  to  compare  two  shapes  V1 ,  V2  with  Fourier  descriptors 
G1,G2,  respectively.  We  first  calculate  the  corresponding  invariant 
descriptors  (as  described  in  Alg.  26.6), 
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(Gi,Gf)  <-  Makelnvariant(G1)  , 
(G^,  G2)  Makelnvariant(G2)  . 


(26.116) 


Now  we  use  Eqn.  (26.114)  to  calculate  the  complex- valued  distance  as 
dmin  =  min(distc((?^,  G2O,  distc(Grfg  G%  ))  (26.117) 

or,  alternatively,  as 

^min  =  min(distc(G^ ,  G^  ),  distc(Gf ,  G^  ))  •  (26.118) 

Note  that,  in  any  case,  the  resulting  distance  dmin  will  be  small  only 
if  the  two  shapes  V1:  V2  are  really  similar.  This  also  means  that  we 
only  need  to  store  one  of  the  two  normalized  Fourier  descriptors — 
for  example,  — for  each  reference  shape  Vref  and  then  (following 

Eqn.  (26.117))  compare  it  to  both  normalized  descriptors  G^w  and 
G„ew  of  any  new  shape  Cnew.21 

To  illustrate  this  idea,  Alg.  26.8  shows  the  construction  of  a  sim¬ 
ple  Fourier  descriptor  database  from  a  set  of  reference  shapes  and 
its  subsequent  use  for  classifying  unknown  shapes.  First,  procedure 
MakeFdDataBase(V)  returns  a  map  D  holding  a  normalized  Fourier 
descriptor  for  each  of  the  reference  shapes  given  in  V.  Matching  a 
new  shape  Vnew  to  the  entries  in  the  database  D  is  accomplished  by 
procedure  FindBestMatch(Fnew,  D,dmax),  which  returns  the  index  of 
the  best-fitting  shape  in  D,  or  nil  if  the  distance  of  the  closest  match 
exceeds  the  predefined  threshold  dmax.  As  common  in  this  situation, 
we  use  squared  distance  values  (i.e.,  dist^)  for  matching  in  Alg.  26.8 
(lines  15-18),  thereby  avoiding  the  square  root  operations  in  Eqns. 
(26.112)  and  (26.114). 


26.7  Java  Implementation 

The  algorithms  described  in  this  chapter  have  been  implemented  as 
part  of  the  open  imagingbook  library,22  which  is  available  at  the 
book’s  accompanying  website.  As  usual,  most  Java  methods  are 
named  and  structured  identically  to  the  procedures  defined  in  the 
various  algorithms  for  easy  identification. 


FourierDescriptor  (class) 


This  is  the  main  class  of  this  package;  it  holds  all  data  structures  and 
implements  the  functionality  common  to  all  Fourier  descriptors,  in¬ 
cluding  methods  for  shape  reconstruction,  invariance,  and  matching, 
as  will  be  described  here. 


21 


The  justification  for  keeping  only  one  of  the  two  normalized  descriptors 
G^e f,  Gfei  of  each  reference  shape  Vre{  is  that  if  two  candidate  shapes 
V1 ,  V2  are  similar,  then  the  normalization  will  produce  pairs  of  Fourier 
descriptors  (G^,Gf  )  and  (G2,  G2)  that  are  also  similar  but  not  nec¬ 
essarily  in  the  same  order.  Therefore  Gi  must  only  match  with  either 
G2  or  G 2  to  detect  the  similarity  of  V1  and  V2. 

Package  imagingbook . pub . f d. 
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22 


1: 


2: 

3: 

4: 

5: 

6: 

7: 

8: 

9: 


10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 


MakeFdDataBase(Vref ,  M') 

Input:  Vref  =  (Vo,  Vf, . . . ,  V)y  ),  a  sequence  of  reference  shapes; 
M' ,  the  number  of  Fourier  coefficients.  Returns  a  sequence  of 
model  Fourier  descriptors  for  the  reference  shapes  in  Vref. 

|  Vref  | 

R  V-  new  map  of  Fourier  descriptors  over  [0,  7VR  —  1] 

for  i  <—  0, . . . ,  NR  —  1  do 

G  FourierDescriptorUniform(Vref  (i),  M')  >  Alg.  26.3 

(■ Ga,Gb )  <—  Makelnvariant(G?)  >  Alg.  26.6 

R(i)  V-  Ga  t>  store  only  one  normalized  descriptor  ( GA ) 

return  R. 


FindBestMatch(Kew,  M' ,  R,  dmax) 

Input:  Rnew,  a  new  shape;  M' ,  the  number  of  Fourier  coefficients; 
R,  a  sequence  of  reference  Fourier  descriptors;  dmax,  maximum 
squared  distance  acceptable  for  a  positive  match.  Returns  the 
best-matching  shape  index  im[n  or  nil  if  no  acceptable  match  was 
found. 


Gnew  <—  FourierDescriptorUniform(Vnew,  M') 
(Gnew,G®ew)  <-  Makelnvariant(Gnew) 


d 


mm 


4 —  OO , 


hnin  ^ 


>  Alg.  26.3 

>  Alg.  26.6 


for  i  <—  0,  ...,|R|  —  1  do 

GAei  <r-  R (i) 

d2  V-  min(D2(G,new?  GAei),  D2(Gfnew?  Gh^f))  >  Eq.  26.118 
if  d2  <\  dmin  then 

^min  ^  ^2 


^min  ^  ^ 

if  rj  •  <T  d  thpn 
ii  Gmm  u,max 


return  zmi] 
else 

return  nil. 


>  best  match  index  is  i 


min 


>  no  matching  shape  found  in  R 


23:  D2  (G1?G2) 

Returns  the  squared  complex  distance  distc(Gl5  G2)  between  the 
Fourier  descriptors  G1,G2  (see  Eq.  26.114). 

24:  d<r-  0,  Mp^-(min(|G1|,|G2|)-l)-r2 

25:  for  rn  < - Mp, . . . ,  Mp,  m/0  do 

26:  d  V-  d  +  [Re(Gf1  (m))  —  Re(Gf2(m))]2  + 

[In^G^  (m))  —  Im(Gf2(m))]2 

27:  return  d.  >  d  =  (distc(G'1,  G2))2 


26.7  Java 
Implementation 

Alg.  26.8 

Simple  shape  matching  with  a 
database  of  Fourier  descriptors. 
MakeFdDataBase(Vref ,  M ') 
creates  and  returns  a  new 
database  (map)  R  from  a 
sequence  of  reference  shapes 
Vref.  R  can  then  be  passed  to 
FindBestMatch(  Vnew ,  AT,  R,  dmax) 
for  classifying  a  new  shape 
Vnew)  where  dmax  is  a 
predefined  distance  threshold. 


Class  FourierDescriptor  is  abstract  and  thus  cannot  be  instan¬ 
tiated.  To  create  Fourier  descriptor  objects,  one  of  the  concrete  sub¬ 
classes  FourierDescriptorUnif orm  or  FourierDescriptorFrom- 
Polygon  (discussed  later  in  this  section)  may  be  used,  which  pro¬ 
vide  the  appropriate  constructors.  FourierDescriptor  provides  the 
following  methods  for  both  types  of  Fourier  descriptors. 

Access  to  Fourier  coefficients 

Complex[]  getCoef f icients  () 

Returns  the  complete  vector  of  complex-valued  Fourier  coeffi¬ 
cients.23 


23 


The  class  Complex  is  defined  in  package  imagingbook.  lib  .math. 
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Complex  getCoef f icient  (int  m) 

Returns  the  value  of  the  Fourier  coefficient  G(m  mod  M),  with 
M  =  G  as  above. 


Complex  setCoeff icient  (int  m.  Complex  z) 

Replaces  the  Fourier  coefficient  G(m  mod  M)  by  the  complex 
value  z,  with  M  =  \G\  as  above. 

Complex  setCoeff icient  (int  m,  double  a,  double  b) 

Replaces  the  Fourier  coefficient  G(m  mod  M)  by  the  complex 
value  z  =  a  +  i  •  b,  with  M  =  |G|  as  above. 

int  size  () 

Returns  the  length  (M)  of  the  Fourier  descriptor. 


int  getMaxNegHarmonic  () 

Returns  the  max.  negative  harmonic  m  =  —  (M  —  1)  -F  2  for 
this  Fourier  descriptor  (of  length  M). 


int  getMaxPosHarmonic  () 

Returns  the  max.  positive  harmonic  m  =  M-i-2  for  this  Fourier 
descriptor  (of  length  M). 


int  getMaxCoef f icientPairs  () 

Returns  the  maximum  number  of  coefficient  pairs,  (M  —  1)-P2, 
for  this  Fourier  descriptor  (of  length  M). 

void  truncate  (int  Mp) 

Truncates  this  Fourier  descriptor  to  the  Mp  lowest- frequency 
coefficients  (see  Eqn.  (26.23)). 


Comparing  Fourier  descriptors 

double  distanceComplex  (FourierDescriptor  fd2) 

Returns  the  complex- valued  distance  (dist C(G1?  G2),  see  Eqn. 

(26.114) )  between  this  Fourier  descriptor  (G^)  and  another 
Fourier  descriptor  fd2  (G2).  The  zero-coefficients  are  ignored. 

double  distanceComplex  (FourierDescriptor  fd2,  int  Mp) 

As  above,  but  using  only  Mp  coefficient  pairs  (see  Eqn. 

(26.114) ). 

double  distanceMagnitude  (FourierDescriptor  fd2) 

Returns  the  magnitude-only  distance  (distM(Gl5  G2),  see  Eqn. 

(26.112) )  between  this  Fourier  descriptor  (Gx)  and  another 
Fourier  descriptor  fd2  (G2).  The  zero-coefficients  are  ignored. 

double  distanceMagnitude  (FourierDescriptor  fd2, 
int  Mp) 

As  above,  but  using  only  Mp  coefficient  pairs  (see  Eqn. 

(26.112) ). 


Shape  reconstruction 

Complex []  getReconstruction  (int  N) 

Returns  the  shape  reconstructed  from  the  complete  Fourier  de¬ 
scriptor  as  a  sequence  of  N  complex- valued  contour  points.  The 
contour  points  are  obtained  by  evaluating  getReconstruct- 
ionPoint(t)  at  uniformly  spaced  positions  t  E  [0,1). 

Complexf]  getReconstruction  (int  N,  int  Mp) 

Returns  a  partial  shape  reconstruction  from  Mp  Fourier  coeffi¬ 
cient  pairs  as  a  sequence  of  N  complex- valued  contour  points. 
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Complex  getReconstructionPoint  (double  t) 

Returns  a  single  point  (as  a  complex  value)  on  the  continuous 
contour  for  path  parameter  t  E  [0, 1),  reconstructed  from  the 
complete  Fourier  descriptor  (see  Eqn.  (26.20)). 

Complex  getReconstructionPoint  (double  t,  int  Mp) 

Returns  a  single  point  (as  a  complex  value)  on  the  continuous 
contour  for  path  parameter  t  E  [0, 1),  reconstructed  from  Mp 
Fourier  coefficient  pairs. 


26.7  Java 
Implementation 


Normalization 

FourierDescriptor  []  makelnvariant  () 

Returns  a  pair  of  Fourier  descriptors  (GA,GB)  that  are  nor¬ 
malized  for  scale,  start  point  shift  and  shape  rotation  (see  Alg. 
26.6). 

double  makeRotationlnvariant  () 

Normalizes  the  Fourier  descriptor  for  shape  rotation  by  phase- 
shifting  all  coefficients  (see  Alg.  26.6).  Returns  the  estimated 
rotation  angle  /3. 

double  make Sc ale Invariant  () 

Normalizes  the  Fourier  descriptor  for  scale  by  multiplying  with 
a  common  factor,  such  that  the  L2  norm  of  the  resulting  vector 
is  1.  Returns  the  scale  factor  that  was  applied  for  normaliza¬ 
tion. 

FourierDescriptor  []  makeStartPoint Invariant  () 

Returns  a  pair  of  normalized  Fourier  descriptors  (GA,GB), 
one  for  each  start  point  normalization  angles  cpA  and  <pB  =  cpA 
+  7T,  respectively  (see  Alg.  26.7). 

void  makeTranslationlnvariant  () 

Modifies  this  Fourier  descriptor  by  setting  the  coefficient  G(0) 
to  zero.  This  method  is  rarely  needed  because  G(0)  is  ignored 
for  matching. 


FourierDescriptorUnif orm  (class) 

This  sub-class  of  FourierDescriptor  represents  Fourier  descriptors 
obtained  from  uniformly  sampled  contours,  as  described  in  Alg.  26.2. 
It  provides  the  constructor  methods 

FourierDescriptorUnif  orm  (Point2D[]  V), 
FourierDescriptorUnif  orm  (Point2D  []  V,  int  Mp), 
where  V  is  a  sequence  of  M  contour  points  (Point 2D),  assumed  to 
be  uniformly  sampled.  The  first  constructor  creates  a  full  Fourier 
descriptor  with  M  coefficients  (see  Alg.  26.2).  The  second  constructor 
creates  a  Fourier  descriptor  with  Mp  coefficient  pairs  (i.e. ,  2  •  Mp  +  1 
coefficients),  as  described  in  Alg.  26.3 

FourierDescriptorFromPolygon  (class) 

This  sub-class  of  FourierDescriptor  represents  Fourier  descriptors 
obtained  directly  from  polygons  (without  contour  sampling,  see  Alg. 
26.5).  It  provides  the  single  constructor  method 

FourierDescriptorFromPolygon  (Point2D  []  V,  int  Mp), 
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26  Fourier  Shape  where  V  is  a  sequence  of  polygon  vertices  and  Mp  specifies  the  number 
Descriptors  of  Fourier  coefficient  pairs. 

PolygonSampler  (class) 

Instances  of  this  utility  class  can  be  used  to  produce  uniformly  sam¬ 
pled  polygons. 

Point2D[]  samplePolygonUnif  ormly  (Point2D  []  V,  int  M) 

Samples  the  closed  polygon  path  specified  by  the  vertices  in 
V  at  M  equi-distant  positions  and  returns  the  resulting  point 
sequence  (see  Alg.  26.1). 

Example 

The  code  example  in  Prog.  26.1  demonstrates  the  use  of  the  Fourier 
descriptor  API.  It  assumes  that  the  binary  input  image  (ip)  con¬ 
tains  at  least  one  connected  foreground  region.  Region  labeling 
and  contour  extraction  is  applied  first,  using  methods  provided  by 
the  imagingbook .  regions  and  imagingbook .  contours  packages.24 
Subsequently,  the  longest  region  contour  (C)  is  used  to  create  a  Fourier 
descriptor  (fd)  with  MP  =  15  coefficient  pairs.  A  partial  reconstruc¬ 
tion  is  calculated  from  the  original  Fourier  descriptor  with  100  sample 
points  along  the  contour.  The  last  lines  show  how  a  pair  of  invariant 
descriptors  ( GA ,  GB)  is  obtained  by  applying  the  make  Invar  i  ant  () 
method.  Note  that  the  code  fragment  in  Prog.  26.1  is  not  complete 
but  would  typically  be  part  of  the  run()  method  in  an  Image J  plugin. 
The  full  version  and  additional  code  examples  can  be  found  on  the 
book’s  website. 


26.8  Discussion  and  Further  Reading 

The  use  of  Fourier  descriptors  for  shape  description  and  matching 
dates  back  to  the  early  1960’s  [55,81],  advanced  by  the  work  of  Zahn 
and  Roskies  [262],  Granlund  [93],  Richard  and  Hemami  [196],  and 
Persoon  and  Fu  [183, 184]  in  the  1970s,  particularly  in  the  context 
of  character  recognition  and  aircraft  identification.  Making  Fourier 
descriptors  invariant  against  various  geometric  transformations  was  a 
key  issue  from  the  very  beginning,  and  several  relevant  contributions 
were  published  in  the  1980s,  including  [245],  [57]  [143],  and  [189]. 
Unfortunately,  as  illustrated  in  this  chapter,  to  achieve  robust  in¬ 
variance  and  uniqueness  of  representation  in  practice  is  not  as  easy 
as  sometimes  suggested  in  the  literature,  despite  the  simplicity  and 
elegance  of  the  underlying  theory.  In  practice,  normalization  for  de¬ 
scriptor  invariance  is  quite  difficult  for  arbitrary  shapes  because  of 
possibly  vanishing  Fourier  coefficients  and  the  resulting  sensitivity  to 
noise. 

Fourier  descriptors  have  nevertheless  become  popular  in  a  wide 
range  of  applications,  including  geology  and,  in  particular,  biological 
imaging,  as  documented  by  the  work  of  Lestrel  and  others  in  [146]. 
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24 


See  also  Chapter  10. 


1  ... 

2  import  imagingbook . lib .math . Complex ; 

3  import  imagingbook . pub . f d . * ; 

4  import  imagingbook . pub . regions . * ; 

5 

6  ByteProcessor  ip  ... ;  //  assumed  to  contain  a  binary  image 

7 

8  //  segment  ip  and  select  the  longest  outer  region  contour: 

9  RegionContourLabeling  labeling  = 

10  new  RegionContourLabeling (ip) ; 

11  List<Contour>  outerContours  = 

12  labeling . get AllOuterCont ours (true) ; 

13  Contour  contr  =  outerContours  .  get  (0)  ;  //  get  the  longest  contour 

14  Point2D[]  V  =  contr .  getPoint Array  ()  ; 

15 

16  //  create  the  Fourier  descriptor  for  V  with  15  coefficient  pairs: 

17  FourierDescriptor  fd  =  new  FourierDescriptorUnif orm (V,  15); 

18 

19  //  reconstruct  the  corresponding  shape  with  100  contour  points: 

20  ComplexG  R  =  fd.getReconstruction(lOO)  ; 

21 

22  //  create  a  pair  of  invariant  descriptors  ( GA ,  GB)\ 

23  FourierDescriptor  []  fdAB  =  fd. make Invar i ant () ; 

24  FourierDescriptor  fdA  =  fdAB[0];  II  —  GA 

25  FourierDescriptor  fdB  =  fdAB[l];  II  —  GB 

26  ... 


26.9  Exercises 

Prog.  26.1 

Fourier  descriptor  code  ex¬ 
ample.  The  input  image  ip 
is  assumed  to  contain  a  bi¬ 
nary  image  (line  6).  The  class 
RegionContourLabeling  is  used 
to  find  connected  regions  (line 
10).  Then  the  list  of  outer 
contours  is  retrieved  (line  12) 
and  the  longest  contour  is 
assigned  to  V  as  an  array  of 
type  Point2D  (lines  13—14).  In 
line  17,  the  contour  V  is  used 
to  create  a  Fourier  descrip¬ 
tor  with  15  coefficient  pairs. 
Alternatively,  we  could  have 
created  a  Fourier  descriptor 
of  the  same  length  (number 
of  coefficients)  as  the  contour 
and  then  truncated  it  (using 
the  truncate  ()  method)  to  the 
specified  number  of  coefficient 
pairs.  A  partial  reconstruction 
of  the  contour  (with  100  sam¬ 
ple  points)  is  calculated  from 
the  Fourier  descriptor  fd  in 
line  20.  Finally,  a  pair  of  in¬ 
variant  descriptors  (contained 
in  the  array  fdAB)  is  calculated 
in  line  23. 


Fourier  descriptors  have  been  extended  to  accommodate  affine  trans¬ 
formations  and  applied  to  3D  object  identification  [5]  and  stereo 
matching  [257]. 

Although  Fourier  descriptors  have  been  investigated  to  handle 
open  contours  and  partial  shapes  [148],  they  are  naturally  best  suited 
to  dealing  with  closed  contours,  as  we  have  described.  Of  course,  this 
is  a  limitation  if  shapes  are  only  partially  visible  or  occluded.  The 
presentation  in  this  chapter  was  limited  to  what  are  frequently  called 
“elliptical”  Fourier  descriptors  [93],  since  they  are  most  popular  and 
well  known.  Other  types  of  Fourier  descriptors  have  been  proposed, 
which  are  not  covered  here  but  can  be  found  elsewhere  in  the  litera¬ 
ture  (see,  e.g.,  [126,  p.  534]  and  [174,  Ch.  7]). 


26.9  Exercises 

Exercise  26.1.  Verify  that  the  DFT  spectrum  is  periodic,  that  is, 
that  G(  —m)  =  G(M  —  m)  holds  for  arbitrary  m  E  Z  (as  claimed  in 
Eqn.  (26.22)). 

Exercise  26.2.  Algorithm  26.9  shows  an  alternative  solution  to  uni¬ 
form  polygon  sampling.  Implement  this  algorithm  and  verify  that  it 
is  equivalent  to  Alg.  26.1  (implemented  as  method  samplePolygon- 
UniformlyO  in  class  PolygonSampler,  see  Sec.  26.7). 

Exercise  26.3.  Assume  that  the  complete  outer  contour  of  a  binary 
region  is  given  as  a  sequence  of  P  boundary  pixels  with  coordinates 


26  Fourier  Shape 
Descriptors 

Alg.  26.9 

Uniform  sampling  of  a  polygon 
path  (alternative  to  Alg.  26.1, 
proposed  by  J.  Heinzelreiter). 


1: 

SamplePolygonUniformly(F,  M) 

Input:  V  =  (v0, . .  . ,  Ujv-i),  a 

sequence  of  N  points  representing 

the  vertices  of  a  closed  2D  polygon;  M,  number  of  desired  sample 

points.  Returns  a  new  sequence  g  —  (g0l . . . ,  gM_ i)  °f  complex 

values  representing  sample  points  sampled  uniformly  along  the 

path  of  the  input  polygon  V. 

2 

N  <r-  \V\ 

3 

A  4—  -T  .  PathLength(C) 

>  segment  length  A,  see  Alg.  26.1 

4 

Create  map  g :  [0,  M—  1]  -A  C 

>  complex  point  sequence  g 

5 

flf(0)  A-  Complex(E(0)) 

6 

i  A-  0 

>  index  of  path  segment  (V^,  Vi+1) 

7 

k  <—  1  >  index  of  first  unassigned  point  in  g 

8 

dp  0  >  path  distance  between  V(i)  and  V(k  —  1) 

9 

while  (i  <  N )  A  (k  <  M )  do 

10 

va  v  M 

11 

vB  V((i  +  1)  mod  N ) 

12 

J  —  ^A  || 

D>  Euclidean  distance 

13 

if  (A  —  dp)  <  S)  then 

14 

A  —  d  , 

*  A  rA  4  s  •  (vB 

—  vA)  D>  xk  by  lin.  interpolation 

15 

g(k)  Complex(a?) 

16 

dp  i —  dp  —  A 

17 

k  k  +  1 

18 

else 

19 

dp  i —  dp  -p  S 

20 

i  <—  i  +  1 

21 

return  g. 

V  =  (p o, . . .  ,pp_i).  To  produce  a  Fourier  descriptor  of  length  M  < 
P  there  are  several  options: 

1.  Sample  the  original  contour  V  at  M  uniformly-spaced  positions 
(see  Alg.  26.1)  and  then  calculate  the  Fourier  descriptor  of  length 
M  using  Alg.  26.2. 

2.  Calculate  a  partial  Fourier  descriptor  of  length  M'  from  the  orig¬ 
inal  contour  V  using  Alg.  26.3. 

3.  Calculate  the  full  Fourier  descriptor  (of  length  M)  from  the  orig¬ 
inal  contour  V  (using  Alg.  26.2)  and  subsequently  truncate25  the 
Fourier  descriptor  to  length  M',  as  described  in  Eqns.  (26.23) 
and  (26.24). 

4.  Treat  the  original  boundary  coordinates  V  as  the  vertices  of  a 
closed  polygon  and  calculate  a  Fourier  descriptor  with  MP  = 
M  A-  2  coefficient  pairs,  using  the  trigonometric  method  described 
in  Alg.  26.5. 

Compare  these  approaches  and  discuss  their  individual  merits  or  dis¬ 
advantages  in  terms  of  efficiency  and  accuracy. 

Exercise  26.4.  Test  the  Fourier  descriptor  normalization  described 
in  Algs.  26.6  and  26.7  (implemented  by  method  make  Invar  i ant  () 
in  the  Java  API)  for  changes  in  scale,  start  point  shift,  and  shape 
rotation  on  a  suitable  set  of  binary  shapes  (e.g.,  images  from  the 
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See  method  truncate (int  Mp)  in  Sec.  26.7. 


26.9  Exercises 


KIMIA  dataset  [134]).  See  the  examples  for  shape  rotation  and  (im¬ 
plicit)  start  point  shifts  in  Fig.  26.21.  How  reliably  do  the  normalized 
Fourier  descriptors  of  the  modified  shapes  match  to  their  correspond¬ 
ing  originals? 

Exercise  26.5.  Magnitude-only  matching  (see  Sec.  26.6.1)  is  much 
simpler  than  complex-valued  matching  (see  Sec.  26.6.2)  of  Fourier 
descriptors,  since  no  normalization  for  phase  (start  point  shift  and 
shape  rotation)  is  required.  However,  it  can  be  assumed  that  differ¬ 
ent  shapes  are  more  likely  to  be  confused  if  the  phase  information  is 
ignored.  Test  this  hypothesis  on  a  large  number  and  variety  of  differ¬ 
ent  shapes.  Compare  the  confusion  probability  for  magnitude-only 
vs.  complex- valued  matching. 
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Mathematical  Symbols  and  Notation 


A.1  Symbols 

The  following  symbols  are  used  in  the  main  text  primarily  with  the 
denotations  given  here.  While  some  symbols  may  be  used  for  pur¬ 
poses  other  than  the  ones  listed,  the  meaning  should  always  be  clear 
in  the  particular  context. 

(a0, . . . ,  dn-i)  A  vector  or  list ,  that  is,  an  ordered  sequence  of  n 
elements  of  the  same  type.  Unlike  a  set  (see  below),  a  list 
may  contain  the  same  element  more  than  once.  If  used  to 
denote  a  vector ,  then  (a0, . . .  ,an_1)  is  usually  a  row  vector 
and  (a0, . . . ,  an_1)T  is  the  corresponding  (transposed)  column 
vector.1  If  used  to  represent  a  list ,2  ()  represents  the  empty 
list  and  (a)  is  a  list  with  a  single  element  a.  \A\  is  the  length 
of  the  sequence  A,  that  is,  the  number  of  contained  elements. 
A^B  denotes  the  concatenation  of  A ,  B.  A(i)  or  ai  refers  to 
the  i- th  element  of  A.  A{i)  +-  x  means  that  the  +th  element 
of  A  is  set  to  (i.e.,  replaced  by)  the  quantity  x. 

{a,  6,  c,  d, . . .}  A  set,  that  is,  an  unordered  collection  of  distinct  ele¬ 
ments.  A  particular  element  x  can  be  contained  in  a  set  at 
most  once.  {  }  denotes  the  empty  set.  \A\  is  the  size  (car¬ 
dinality)  of  the  set  A.  A  U  B  is  the  union  and  An  B  is  the 
intersection  of  two  sets  A,  B.  x  E  A  means  that  the  element 
x  is  contained  in  A. 

(A,  B ,  C)  A  tuple ,  that  is,  a  fixed-size,  ordered  sequence  of  elements, 
each  possibly  of  a  different  type.3 

1  In  most  programming  environments,  vectors  are  implemented  as  one¬ 
dimensional  arrays,  with  elements  being  referred  to  by  position  (index). 

2  Lists  are  usually  implemented  with  dynamic  data  structures,  such  as 
linked  lists.  Java’s  Collections  framework  provides  numerous  easy-to- 
use  list  implementations. 

3  Tuples  are  typically  implemented  as  objects  (in  Java  or  C++)  or  struc¬ 
tures  (in  C)  with  elements  being  referred  to  by  name. 
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a,  b] 
A 


A 

x 


\x 

\x 


* 


X 

0 

© 

o 


714 


Numeric  interval;  x  G  [a,  b\  means  a  <  x  <  b.  Similarly, 
x  G  [a,  6)  says  that  a  <  x  <  b. 

Length  (number  of  elements)  of  a  sequence  (see  above)  or  size 
(cardinality)  of  a  set  A ,  that  is,  |M|  =  card  A. 

Determinant  of  a  matrix  A  (|A|  =  det(A)). 

Absolute  value  (magnitude)  of  a  scalar  or  complex  quantity 


x. 


Euclidean  (L2)  norm  of  a  vector  x. 
tude  of  x  using  a  particular  norm  L 


x 

n ' 


denotes  the  magni- 


“Ceil”  of  t,  the  smallest  integer  z  G  Z  greater  than  x  G  R. 
For  example,  [3.141]  =  4,  [—1.2]  =  —1. 

“Floor”  of  x,  the  largest  integer  z  G  Z  smaller  than  x  G  R. 
For  example,  [3.141]  =  3,  [—1.2]  =  —2. 

Integer  division  operator:  a  G-  b  denotes  the  quotient  of  the 
two  integers  a,  b.  For  example,  5  G  3  =  1  and  —13  G-  4  = 
—3  (equivalent  to  Java’s  “/”  operator  in  the  case  of  integer 
operands). 

Linear  convolution  operator  (see  Sec.  5.3.1). 

Linear  correlation  operator  (see  Sec.  23.1.1). 

Outer  vector  product  (see  Sec.  B.3.2). 

Cross  product  (between  vectors  or  complex  quantities  (see 
Sec.  B.3.3). 

Morphological  dilation  operator  (see  Sec.  9.2.3). 

Morphological  erosion  operator  (see  Sec.  9.2.4). 

Morphological  opening  operator  (see  Sec.  9.3.1). 

Morphological  closing  operator  (see  Sec.  9.3.2). 

Concatenation  operator.  Given  two  sequences  A  =  (a,  6,  c) 
and  B  =  (d,  e),  A^B  denotes  the  concatenation  of  A  and  B , 
with  the  result  (a,  6,  c,  d,  e).  Inserting  a  single  element  x  at 
the  end  or  front  of  the  list  A  is  written  as  A^(pc)  or  (pc)^A, 
resulting  in  (a,  6,  c,  x)  or  (x,  a,  6,  c),  respectively. 

“Similarity”  relation  used  in  the  context  of  random  variables 
and  statistical  distributions. 


“Approximately  equal”  relation. 

Equivalence  relation. 

Assignment  operator:  a  expr  means  that  expression  expr 
is  evaluated  and  subsequently  the  result  is  assigned  to  the 
variable  a. 

Incremental  assignment  operator:  a  G1  b  is  equivalent  to  a  <— 
a  H-  b. 

Function  definition  operator  (used  in  algorithms).  For  exam¬ 
ple,  f(x)  :=  x2  +  5  defines  a  function  /()  with  the  bound 
variable  (formal  function  argument)  x. 

“upto”  (incrementing)  iteration,  used  in  loop  constructs  like 
for  q  G-  1,  •  • ' ,  K  (with  q  —  1,  2, . . . ,  K  —  1,  K). 

“downto”  (decrementing)  iteration,  for  example,  for  q  <(— 
A,  •  • ,  1  (with  q  =  K,K  —  1, . . . ,  2, 1). 


A.l  Symbols 


A 

V 

d 


V 


V1 


0 

adj 


Logical  “and”  operator. 

Logical  “or”  operator. 

Partial  derivative  operator  (see  Sec.  6.2.1).  For  example, 
9  f  denotes  the  first  derivative  of  the  multi-dimensional 


Mal¬ 


function  f(x1,x2,  •  •  •  ixn)  :  — >■  IP  along  variable  fppzf 

is  the  second  derivative  (i.e.,  differentiating  /  twice  along 
variable  xf),  etc. 

Gradient  operator.  The  gradient  of  a  multi-dimensional  func¬ 
tion  f(x1,X2,---,xn)  :  IPn  IP,  denoted  V/  (also  Vy  or 
grad  f),  is  the  vector  of  its  first  partial  derivatives  (see  also 
Sec.  C.2.2). 

Laplace  operator  (or  Laplacian).  The  Laplacian  of  a  multi¬ 
dimensional  function  /(aq,  x2, . . . ,  xn)  :  Mn  -A  IP,  denoted 
V2/  (or  V2),  is  the  sum  of  its  second  partial  derivatives  (see 
Sec.  C.2.5). 

Zero  vector,  0  =  (0, . . . ,  0)T. 

Adjugate  of  a  square  matrix,  denoted  adj  (A);  also  called  ad¬ 
joint  in  older  texts. 


AND  Bitwise  “and”  operation.  Example:  (001 lb  AND  1010b)  = 
0010b  (binary)  and  (3  AND  6)  =  2  (decimal). 

ArcTan(x,  y)  Inverse  tangent  function.  The  result  of  ArcTan(x,  y) 
is  equivalent  to  arctan(^)  =  tan_1(|)  but  with  two  argu¬ 
ments  and  returning  angles  in  the  range  [— 7r,  +7r]  (i.e.,  cov¬ 
ering  all  four  quadrants).  ArcTan (x,y)  is  equivalent  to  the 
ArcTan  [x,  ?/]  function  in  Mathematica  and  the  Math.atan2 
(y,  x)  method  in  Java  (but  note  the  reversed  arguments!). 

C  The  set  of  complex  numbers. 


card  Size  (cardinality)  of  a  set.  card(Al)  =  \A\  (see  also  Sec.  3.1). 
det  Determinant  of  a  matrix  (det(A)  =  |A|). 

DFT  Discrete  Fourier  transform  (see  Sec.  18.3). 
e  Euler’s  constant. 


e  Unit  vector.  For  example,  ex  =  (1,0)T  denotes  the  2D  unit 
vector  in  x-direction.  e0  =  (cos  0,  sin  0)T  is  the  2D  unit  vector 
oriented  at  angle  6  and  ei5ej,ek  are  the  unit  vectors  along 
the  coordinate  axes  in  3D. 


exp  Exponential  function:  exp(x)  =  ex . 

T  Continuous  Fourier  transform  (see  Sec.  18.1.4). 
false  Boolean  constant  (false  =  -itrue). 

grad  Gradient  operator  (see  V). 

h  Histogram  of  an  image  (see  Sec.  3.1). 

H  Cumulative  histogram  (see  Sec.  3.6). 

H  Hessian  matrix  (see  Sec.  C.2.6). 

horn  Operator  for  converting  Cartesian  to  homogeneous  coordi¬ 
nates.  hom(x)  =  x  maps  the  Cartesian  point  x  to  a  corre¬ 
sponding  homogeneous  point  x\  the  reverse  mapping  is  de¬ 
noted  horn-1  (a?)  =  x  (see  Sec.  B.5). 

i  Imaginary  unit  (i2  =  —1),  see  Sec.  A. 3. 
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I  Image  with  scalar  pixel  values  (e.g.,  an  intensity  or  grayscale 

image).  I(u,v)  G  R  is  the  pixel  value  at  position  (u,v) 

I  Vector- valued  image,  for  example,  a  RGB  color  image  with 

3D  color  vectors  I(u,v)  G  M3  at  position  (u,v). 

In  Identity  matrix  of  size  n  x  n.  For  example,  I2  =  (  q  ? )  is  the 
2x2  identity  matrix. 

J  Jacobian  matrix  (see  Sec.  C.2.1). 

L1?  L2,  Common  distance  measures  or  norms  (see  Eqns.  (15.23)- 
(15.25)). 

M  x  N  Domain  of  pixel  coordinates  (n,  v )  for  an  image  with  M  columns 
(width)  and  N  rows  (height);  used  as  a  shortcut  notation  for 
the  set  {0, . . . ,  M  —  1}  x  {0, . . . ,  N  —  1}. 

mod  Modulus  operator:  (a  mod  b )  is  the  remainder  of  the  integer 
division  a  -G  b  (see  Sec.  F.1.2). 

fi  Arithmetic  mean  value. 

N  The  set  of  natural  numbers;  N  =  {1,  2,  3, . . .},  N0  =  {0, 1,  2, 

. . .  j . 

nil  Null  (“nothing”)  constant,  typically  used  in  algorithms  to 
denote  an  invalid  quantity  (similar  to  null  in  Java). 

p  Discrete  probability  density  function  (see  Sec.  4.6.1). 

P  Discrete  probability  distribution  function  or  cumulative  prob¬ 
ability  density  (see  Sec.  4.6.1). 

Q  Quadrilateral  (see  Sec.  21.1.4). 

R  The  set  of  real  numbers. 


R,  G,  B  Red ,  green  and  blue  color  components. 

rank  Rank  of  a  matrix  A,  denoted  by  rank(A). 

round  Rounding  function:  returns  the  integer  closest  to  the  scalar 
xGl.  round(x)  =  [x  +  0.5J . 

a  Standard  deviation  (square  root  of  the  variance  a2). 

S1  Unit  square  (see  Sec.  21.1.4). 

sgn  “Sign”  or  “signum”  function: 

{1  for  x  >  0 
0  for  x  =  0 
—  1  for  x  <  0 

r  Interval  in  time  or  space. 

t  Continuous  time  variable, 

t  Threshold  value. 

T  Transpose  of  a  vector  (aT)  or  matrix  (AT). 

trace  Trace  (sum  of  the  diagonal  elements)  of  a  matrix,  e.g.,  trace(A). 


true  Boolean  constant  (true  =  -< false). 

u  =  (n,  v)  Discrete  2D  coordinate  variable  with  n,  v  G  Z. 

x  =  (x,  y)  Continuous  2D  coordinate  variable  with  x,  y  G  R. 

XOR  Bitwise  “xor”  (exclusive  OR)  operator.  Example:  (001  lb 
XOR  1010b)  =  1001b  (binary)  and  (3  XOR  6)  =  5  (decimal). 

Z  The  set  of  integers. 
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A.2  Set  Operators 


A. 3  Complex  Numbers 


\A\  The  size  of  the  set  A  (equal  to  card(Nl)). 

\/x  . . .  “All”  quantifier  (for  all  x,  . . . ). 

3X  . . .  “Exists”  quantifier  (there  is  some  x  for  which  . . . ). 
U  Set  union  (e.g.,  AU  B). 
fl  Set  intersection  (e.g.,  A  Cl  B). 

Ai  Union  of  multiple  sets  Ai- 
a  a  Intersection  over  multiple  sets  A-L. 

\  Set  difference:  if  x  G  A  \  £>,  then  x  G  A  and  x  £  B. 


A.3  Complex  Numbers 

Basic  relations: 


z  =  a  +  i  •  b  (with  z,  i  G  C,  a,  b  G  R,  i2 
S'Z  =  s  •  a  +  i-s-b  (for  sgM) 


z 

S'Z 


=  V CL2  +  b2 


z 


z  =  a  +  i  •  b  = 
=  \z\-e^ 

Re  (a  +  i  •  6)  =  a 
Im(a  +  i  •  b)  =  b 


z  |  •  (cos^  +  i  •  sim/j) 

(with  ip  =  ArcTan(a,  6)) 

Re(e1'^)  =  cos  cp 
Im(e1,(/:5)  =  sin  <p 


e1’^  =  cos  p  +  i  •  sin  (p 
e-1'^  =  cos  Lp  —  i  •  sin  (p 

cos(<^)  =  ^  •  (e1'^  +  e-1'^) 
sin(^)  =  i  •  (el'v  ~  e-^) 

z*  =  a  —  i  •  b  (complex  conjugate) 
z-z*  =  z*  -z  =  \z\2  =  a2  +  b2 
z°  =  (a  +  i  •  b)°  =  (1  +  i  •  0)  =  1 


(A.l) 

(A.2) 

(A.3) 

(A.4) 

(A.5) 

(A.6) 

(A.7) 

(A.8) 

(A.9) 
(A. 10) 

(A- 11) 

(A. 12) 

(A. 13) 
(A.14) 
(A. 15) 


Arithmetic  operations: 


z1  =  (a1  +  i-b1)  =  |  z1 


,‘-Vi 


z2  =  (a2  +  i -b2)  =  \z2\ e1'^2 


Z\  +  z2 

z  1  •  22 


(a>i  +  a2 )  +  i-(b1  +  b2), 

(a±-a2  —  bi-b2)  +  i-  (%  -b2  +  bi-a2) 


zi  '  z2 


H'Pi+'Pi) 


£i 

22 


a\- a2 -\- bi-b2  .  a2-bi  —  ai-b2 

-  +  i - 


a, 


+  bl 


a: 


+  6| 


2l 


22 


(A. 16) 
(A.17) 

(A. 18) 
(A. 19) 
(A. 20) 

(A. 21) 


717 


Appendix  B 


Linear  Algebra 


This  part  contains  a  compact  set  of  elementary  tools  and  concepts 
from  algebra  and  calculus  that  are  referenced  in  the  main  text.  Many 
good  textbooks  (probably  including  some  of  your  school  books)  are 
available  on  this  subject,  for  example,  [35,36,145,264].  For  numerical 
aspects  of  linear  algebra  see  [160, 190]. 


B.1  Vectors  and  Matrices 


Here  we  describe  the  basic  notation  for  vectors  in  two  and  three 
dimensions.  Let 


a  = 


(B.l) 


denote  vectors  a,  b  in  2D,  and  analogously 


(B.2) 


vectors  in  3D  (with  ai^bi  E  R).  Vectors  are  used  to  describe  2D  or 
3D  points  (relative  to  the  origin  of  the  coordinate  system)  or  the  dis¬ 
placement  between  two  arbitrary  points  in  the  corresponding  space. 

We  commonly  use  upper-case  letters  to  denote  a  matrix ,  for  ex¬ 
ample, 


/ A),0  A),l\ 

A  =  (  Al,o  Al,i  I  •  (B-3) 

\^2,0  ^2,1/ 

This  matrix  consists  of  3  rows  and  2  columns;  in  other  words,  A  is 
of  size  (3,2).  Its  individual  elements  are  referenced  as  where 
i  is  the  row  index  (vertical  coordinate)  and  j  is  the  column  index 
(horizontal  coordinate).1 

1  Note  that  the  usual  notation  for  matrix  coordinates  is  (unlike  image 
coordinates)  vertical- first! 
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The  transpose  of  A,  denoted  AT,  is  obtained  be  exchanging  rows 
and  columns,  that  is, 


A 


T 


A),0 

Al,o 

^2,0 


Aj,i\ 

A,i 

A2,l ) 


( '4o,o  41)0  A2i(A 

VVl  Al,l  A2,lJ 


(B.4) 


The  inverse  of  a  square  matrix  A  is  denoted  A  1,  such  that 


A  •  A-1  =  I  and  A-1  A  =  I  (B.5) 


(I  is  the  identity  matrix).  Note  that  not  every  square  matrix  has  an 
inverse.  Calculation  of  the  inverse  can  be  performed  in  closed  form 
up  to  the  size  (3,3);  for  example,  see  Eqn.  (21.29)  and  Eqn.  (24.47). 
In  general,  the  use  of  standard  numerical  methods  is  recommended 
(see  Sec.  B.6). 


B.1.1  Column  and  Row  Vectors 


For  practical  purposes,  a  vector  can  be  considered  a  special  case  of  a 
matrix.  In  particular,  a  the  m-dimensional  column  vector 


/  a0  \ 

a  =  : 

\Ojrn  —  i  J 


(B.6) 


corresponds  to  a  matrix  of  size  (m,  1),  while  its  transpose  a T  is  a  row 
vector  and  thus  like  a  matrix  of  size  (1 ,  m).  By  default,  and  unless 
otherwise  noted,  any  vector  is  implicitly  assumed  to  be  a  column 
vector. 


B.1.2  Length  (Norm)  of  a  Vector 

The  length  or  Euclidean  norm  (L2  norm)  of  a  vector  a  =  (a1? . . . , 
am_1)T,  denoted  ||a||,  is  defined  as 


rn  —  1 


a 


(E«C/2 


(B.T) 


2  =  0 


For  example,  the  length  of  the  3D  vector  x  =  (x,  y,  z)T  is 


x 


=  \/  x2  +  y2  +  z 2 


(B.8) 


B.2  Matrix  Multiplicat  on 

B.2.1  Scalar  Multiplication 

The  product  of  a  real-valued  matrix  and  a  scalar  value  s  G  R  is 
defined  as 


/  s-A 


s  ■  A  =  A  ■  s  =  [s-  Aid 


0,0 


5'^0,n-l  \ 


(B.9) 
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‘  ^m—  1,0  ’  "  ’  ^"^-771—1,71—1/ 


B.2.2  Product  of  Two  Matrices 

We  say  that  a  matrix  is  of  size  (ra,  n)  if  consists  of  m  rows  and  n 
columns.  Given  two  matrices  A,  B  of  size  (ra,  n)  and  (p,  g),  respec¬ 
tively,  the  product  A  •  B  is  only  defined  if  n  =  p.  Thus  the  number 
of  columns  (n)  in  A  must  always  match  the  number  of  rows  (p)  in 
B.  The  result  is  a  new  matrix  C  of  size  (ra,g),  that  is, 


B.2  Matrix 
Multiplication 


C  =  A  B 


(  ^-0,0  •  *  •  ^-0,n— 1  ' 

• 

(  B0,o  •  •  •  B 
• 

• 

\^m—  1,0  '  *  *  2^777,—  1,71— 1^ 

• 

\Bn_ i  o  •  •  •  Bn. 

V 

(m,n) 

V 

(n,q) 

J 


(  Q),0  •  •  •  Q),g— 1  \ 


— 1,0  *  *  *  ^m—l,q—l) 


(B.10) 


(m,g) 


with  the  elements 


n— 1 


Cy/  Aj^k  ’  Bk  j , 


(B.ll) 


k= o 


for  i  =  0, . . . ,  m—  1  and  j  =  0, . . . ,  q— 1.  Note  that  this  product  is  not 
commutative,  that  is,  A  •  B  ^  B  •  A  in  general. 


B.2. 3  Matrix- Vector  Products 

The  product  A-x  between  a  matrix  A  and  a  vector  x  is  only  a  special 
case  of  the  matrix-matrix  multiplication  given  in  Eqn.  (B.10).  In 
particular,  if  x  =  (x0, . . .  ,xn_1)T  is  a  n-dimensional  column  vector 
(i.e.,  a  matrix  of  size  (n,  1)),  then  the  multiplication 

V  =  ^A^  (B.12) 

(ra,  1)  (m,n)  (n,  1) 


is  only  defined  if  the  matrix  A  is  of  size  (ra,  n),  for  arbitrary  m  >  1. 
The  result  y  is  a  column  vector  of  length  m  (equivalent  to  a  matrix 
of  size  (ra,  1)).  For  example  (with  m  =  2,  n  =  3), 


A  •  x 


(A  B  C\ 
\D  E  FJ 

' - V - ' 

(2,3) 


(3,1) 


/ A-x  +  B-y  C -z\ 

yZ}  -x  +  E-y  +  F  -  zj 

v - - - ' 

(2,1) 


(B.13) 


Here  A  operates  on  the  column  vector  x  “from  the  left”,  that  is,  A-x 
is  the  left- sided  matrix- vector  product  of  A  and  x. 

Similarly,  a  right-sided  multiplication  of  a  row  vector  xT  of  length 
m  with  a  matrix  of  size  (ra,  n)  is  performed  as 


(1  ,ra)  (m,n)  (l,n) 


(B.14) 
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where  the  result  z  is  a  n-dimensional  row  vector;  for  example  (again 
with  m  =  2,  n  =  3), 


xJ  B  =  (x,y)  '  =  (x-A+y-D,  x-B+y-E,  x-C+y-F )  . 

(B.15) 


(1,2) 


77 


(1,3) 


In  general,  if  A  •  x  is  defined,  then 


A  •  x  =  (xT  •  At)t 


and 


(A  •  x)J  =  xT  ■  AT 


(B.16) 


Thus,  any  right-sided  matrix- vector  product  A  •  x  can  also  be  calcu¬ 
lated  as  a  left-sided  product  xT  •  AT  by  transposing  the  corresponding 
matrix  A  and  vector  x. 


B.3  Vector  Products 

Products  between  vectors  are  a  common  cause  of  confusion,  mainly 
because  the  same  symbol  (•)  is  used  to  denote  widely  different  oper¬ 
ators. 

B.3.1  Dot  (Scalar)  Product 

The  dot  product  (also  called  scalar  or  inner  product)  of  two  vectors 
a  =  (a0, . . . ,  an_1)T,  b  =  (60,  •  •  •  ,frn_i)T  of  the  same  length  n  is 
defined  as 

n  —  1 

x  =  a  •  b  ='s^^ai'bi.  (B.17) 

i= o 

Thus  the  result  x  is  a  scalar  value  (hence  the  name  of  this  product). 
If  we  write  this  as  the  product  of  a  row  and  a  column  vector,  as  in 
Eqn.  (B.14), 


(1,1)  (l,n)  (n,  1) 


(B.18) 


we  conclude  that  the  result  x  is  a  matrix  of  size  (1,1),  that  is,  a  single 
scalar  value.  The  dot  product  can  be  viewed  as  the  projection  of  one 
vector  onto  the  other,  with  the  relation 


a  ■  b  = 


cos(a), 


(B.19) 


where  a  is  angle  enclosed  by  the  vectors  a  and  b.  As  a  consequence, 
the  dot  product  is  zero  if  the  two  vectors  are  orthogonal  to  each  other. 

The  dot  product  of  a  vector  with  itself  gives  the  square  of  its 
length  (see  Eqn.  (B.7)),  that  is, 


n  —  1 

a  ■  a  =  of  = 
2  =  0 


2 
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a 


(B.20) 


B.3.2  Outer  Product 


B.4  Eigenvectors  and 
Eigenvalues 


The  outer  product  of  two  vectors  a  =  (a0, . . . ,  am_1)T,  b  =  (b0, . . . , 
bn_1)T  of  length  m  and  n,  respectively,  is  defined  as 


M  =  a  ®  b  =  a  ■  bJ 


l  a0b0 

ai^o 


a0b1 

a^i 


aobn-i  \ 

^l^n— 1 


(B.21) 


rn  —  1 


a 


m- 


a 


m- 


1 


7 


Thus  the  result  is  a  matrix  M  with  m  rows  and  n  columns  and 
elements  M;t/j  =  cq  •  fy,  for  i  =  0, . . . ,  m  —  1  and  j  —  1, . . . ,  n  —  1.  Note 
that  a  •  bT  in  Eqn.  (B.21)  denotes  the  ordinary  (matrix)  product  of 
the  column  vector  a  (of  size  mxl)  and  the  row  vector  b1  (of  size 
lxn),  as  defined  in  Eqn.  (B.10).  The  outer  product  is  a  special  case 
of  the  Kronecker  product  (0)  which  generally  operates  on  pairs  of 
matrices. 


B.3.3  Cross  Product 


Although  the  cross  product  (x)  is  generally  defined  for  n-dimensional 
vectors,  it  is  almost  exclusively  used  in  the  3D  case,  where  the  result 
is  geometrically  easy  to  understand.  For  a  pair  of  3D  vectors,  a  = 
(a0,a1,a2)T  and  b  =  (b0,bllb2)J ,  the  cross  product  is  defined  as 


c  =  a  x  b  = 


fa1-b2  -  a2-b1\ 

I  a2  mbo  ~  a0'^2  I  • 
\a0-bi  —  a1  -b0J 


(B.22) 


In  the  3D  case,  the  cross  product  is  another  3D  vector  that  is  per¬ 
pendicular  to  both  of  the  original  vectors.2  The  magnitude  (length) 
of  the  vector  c  relates  to  the  angle  0  between  a  and  b  as 


a  x  b 


a 


b ||  •  sin(0). 


(B.23) 


The  quantity  ||axb||  corresponds  to  the  area  of  the  parallelogram 
spanned  by  the  vectors  a  and  b. 


B.4  Eigenvectors  and  Eigenvalues 

This  section  gives  an  elementary  introduction  to  eigenvectors  and 
eigenvalues,  which  are  mentioned  at  several  places  in  the  main  text 
(see  also  [27,64]).  In  general,  the  eigenvalue  problem  is  to  find  solu¬ 
tions  x  G  Mn  and  A  G  R  for  the  linear  equation 

A  •  x  =  A  •  x  ,  (B.24) 

with  the  given  square  matrix  A  of  size  (n,n).  Any  non-trivial3  so¬ 
lution  x  is  an  eigenvector  of  A  and  the  scalar  A  (which  may  be 

2  For  dimensions  greater  than  three,  the  definition  (and  calculation)  of 
the  cross  product  is  considerably  more  involved. 

3  An  obvious  but  trivial  solution  is  x  =  0  (where  0  denotes  the  zero- 
vector)  . 
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complex-valued)  is  the  associated  eigenvalue.  Eigenvalue  and  eigen¬ 
vectors  thus  always  come  in  pairs  (Aj,aq),  usually  called  eigenpairs. 
Geometrically  speaking,  applying  the  matrix  A  to  an  eigenvector  only 
changes  the  vector’s  magnitude  or  length  (by  the  associated  eigen¬ 
value  A),  but  not  its  orientation  in  space.  Equation  (B.24)  can  be 
rewritten  as 


A  •  x  —  X  ■  x  =  0  or  (A  —  A  -  In)  •  x  =  0  ,  (B.25) 

where  In  is  the  (n,  n)  identity  matrix.  This  homogeneous  linear  equa¬ 
tion  has  non-trivial  solutions  only  if  the  matrix  (A  — A-In)  is  singular , 
that  is,  its  rank  is  less  than  n  and  thus  its  determinant  det()  is  zero, 
that  is, 

det  (A  —  A  •  In)  =0.  (B.26) 

Equation  (B.26)  is  called  the  “characteristic  equation”  of  the  matrix 
A  and  can  be  expanded  to  a  n-th  order  polynomial  in  A.  This  poly¬ 
nomial  has  a  maximum  of  n  distinct  roots,  which  are  the  eigenvalues 
of  A  (that  is,  solutions  to  Eqn.  (B.26)).  A  matrix  of  size  (n,  n)  thus 
has  up  to  n  non-distinct  eigenvectors  aq,  aq, . . . ,  xn,  each  with  an 
associated  eigenvalue  Ax,  A2, . . . ,  An. 

If  they  exist,  the  eigen  values  of  a  matrix  are  unique ,  but  the 
associated  eigen  vectors  are  not!  This  results  from  the  fact  that,  if 
Eqn.  (B.24)  is  satisfied  for  a  vector  x  (and  the  associated  eigenvalue 
A),  it  also  applies  to  any  scaled  vector  sag  that  is, 


A  •  sx  =  A  •  sx  ,  (B.27) 

for  arbitrary  s  G  R  (and  s  /  0).  Thus,  if  x  is  an  eigenvector  of  A, 
then  sx  is  also  an  (equivalent)  eigenvector. 

Note  that  the  eigenvalues  of  a  real-valued  matrix  may  generally 
be  complex.  However,  (as  an  important  special  case)  if  the  matrix  A 
is  real  and  symmetric ,  all  its  eigenvalues  are  guaranteed  to  be  real 


Example 

For  the  real- valued  (non-symmetric)  2x2  matrix 


A 


l  3-2 

1-4  1 


the  two  eigenvalues  and  their  associated  eigenvectors  are 

X1  =  5,  aq  =  S'  (_4^,  an(4  A2  =  —  1,  x2  =  s- 

for  any  nonzero  s  G  R.  The  result  can  be  easily  verified  by  inserting 
pairs  (A]_,aq)  and  (A2,aq),  respectively,  into  Eqn.  (B.24). 


=T 


B.4.1  Calculation  of  Eigenvalues 
Special  case:  2x2  matrix 

For  the  special  (but  frequent)  case  of  n  =  2,  the  solution  can  be  found 
in  closed  form  (and  without  any  software  libraries).  In  this  case,  the 
characteristic  equation  (Eqn.  (B.26))  reduces  to 
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1:  RealEigenValues2x2  ( A ,  B,  C,  D) 

Input:  A,  B,C,  D  £  R,  the  elements  of  a  real-valued  2x2  ma¬ 
trix  A  =  (c  d)  ■  Ref  ^ urns  an  ordered  sequence  of  real- valued 
eigenpairs  (A^,aq)  for  A,  or  nil  if  the  matrix  has  no  real-valued 
eigenvalues. 


2:  R  4r- 

3:  S  i-  ^ 

4:  if  ( S 2  +  B  ■  C)  <  0  then 

5:  return  nil 

6:  else 

7:  T  <—  VS2  +  B-C 

8:  A  1^R  +  T 

9:  A  2^R-T 

10:  if  (A  —  D)  >  0  then 

11:  x1  <-  {S  +  T,C)t 

12:  x2  <-  (B,  -S  -  T ) 

13:  else 

14:  x1  <—  (B,  —S  +  T) 

15:  x2  <r-  (S-T,C) 7 

16:  return  ((Ai,*!),  (A2, *2)) 


>  A  has  no  real- valued  eigenvalues 


>  eigenvalue  X1 

>  eigenvalue  A2 

D>  eigenvector  x1 

>  eigenvector  x2 

>  eigenvector  xx 

>  eigenvector  x2 

c>  X1  ^  A2 


B.4  Eigenvectors  and 
Eigenvalues 

Alg.  B.l 

Calculating  the  real  eigenval¬ 
ues  and  eigenvectors  for  a  2  X  2 
real-valued  matrix  A.  If  the 
matrix  has  real  eigenvalues, 
an  ordered  sequence  of  two 
“eigenpairs”  (A^aq),  each  con¬ 
taining  the  eigenvalue  and 
the  associated  eigenvector  xi, 
is  returned  (i  =  1,2).  The 
resulting  sequence  is  ordered 
by  decreasing  eigenvalues,  nil 
is  returned  if  A  has  no  real 
eigenvalues. 


det(A  —  A  •  I2) 


A  B 
C  D 


A 


1  0 
0  1 


A- A  B 
C  D- A 


=  A2  -  (A  +  D)  ■  A  +  (AD  -  BC)  =  0  . 


(B.28) 

(B.29) 


The  two  possible  solutions  to  this  quadratic  equation, 


a,s  -  m  ± 


2 

A  +  D 


± 


A  +  D\  2 


A-D\‘2 


(AD  -  BC) 


I  1/2 


+  BC 


1/2 


=  R±  VS2  +  BC, 


are  the  eigenvalues  of  the  matrix  A,  with 

A  1=R+  s/S'2  +  B  ■  C, 
A 2=  R-  US'2  +  B  •  C. 


(B.30) 


(B.31) 


Both  A1?  A2  are  real- valued  if  the  term  under  the  square  root  is  pos¬ 
itive,  that  is,  if 

/  4  —  D\  2 

S2  +  B-C=(-^—j  +BC>  0.  (B. 

In  particular,  if  the  matrix  is  symmetric  (i.e. ,  B  =  C),  this  condition 
is  guaranteed  (because  B  •  C  >  0).  In  this  case,  X1  >  A2.  Algorithm 
B.l4  summarizes  the  closed-form  computation  of  the  eigenvalues  and 
eigenvectors  of  a  2  x  2  matrix. 

See  [27]  and  its  reprint  in  [28,  Ch.  5]. 
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General  case:  nXn 

In  general,  proven  numerical  software  should  be  used  for  eigenvalue 
calculations.  See  the  example  using  the  Apache  Commons  Math 
library  in  Sec.  B.6.5. 


B.5  Homogeneous  Coordinates 


Homogeneous  coordinates  are  an  alternative  representation  of  points 
in  multi-dimensional  space.  They  are  commonly  used  in  2D  and 
3D  geometry  because  they  can  greatly  simplify  the  description  of 
certain  transformations.  For  example,  affine  and  projective  trans¬ 
formations  become  matrices  with  homogeneous  coordinates  and  the 
composition  of  transformations  can  be  performed  by  simple  matrix 
multiplication.5 

To  convert  a  given  n-dimensional  Cartesian  point  x  =  (x0, . . . , 
xn_1)T  to  homogeneous  coordinates  x,  we  use  the  notation6 

hom(cc)  =  x.  (B.33) 


This  operation  increases  the  dimensionality  of  the  original  vector  by 
one  by  inserting  the  additional  element  1,  that  is, 


(x  0  \ 

horn  : 


fx°  j 

%n—  1 

— 

•Tn  —  1 

\i  / 

/ 

(B.34) 


Note  that  the  homogeneous  representation  of  a  Cartesian  vector  is  not 
unique,  but  every  multiple  of  the  homogeneous  vector  is  an  equivalent 
representation  of  x.  Thus  any  scaled  homogeneous  vector  x'  =  s  •  x 
(with  s  G  R,  5  /  0)  corresponds  to  the  same  Cartesian  vector  (see 
also  Eqn.  (B.39)). 

To  convert  a  given  homogeneous  point  x  =  (x0, . . . ,  xn)J  back  to 
Cartesian  coordinates  x  we  simply  write 


horn  l(x)  =  x . 


(B.35) 


This  operation  can  be  easily  derived  as 


horn 


-l 


(x  0  \ 


■Hri  —  1 

V— ^  ) 


1 


4:n 


(x0  \  /  x0  \ 


yUn—l/  \Xn—lJ 


(B.36) 


provided  that  xn  ^  0.  Two  homogeneous  points  x2  are  considered 
equivalent  (=),  if  they  represent  the  same  Cartesian  point,  that  is, 

x i  w  x2  O  liom ~1(x1)  =  liom ~1(x2).  (B.3T) 


It  follows  from  Eqn.  (B.36)  that 

5  See  Chapter  21,  Sec.  21.1.2. 

6  The  operator  hom( )  is  introduced  here  for  convenience  and  clarity. 
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horn  1(x)  =  horn  1(s-x) 


(B.38)  E  g  Basic 

Matrix- Vector 

for  any  nonzero  factor  sGi.  Thus,  as  mentioned  earlier,  any  scaled  Operations  with  the 
homogeneous  point  corresponds  to  the  same  Cartesian  point,  that  is,  Apache  Commons  Math 

Library 

x  =  s  •  x.  (B.39) 


For  example,  for  the  Cartesian  point  x  =  (3,  7,  2)T,  the  homogeneous 
coordinates 


hom(cc) 


(B.40) 


are  all  equivalent.  Homogeneous  coordinates  can  be  used  for  vector 
spaces  of  arbitrary  dimension,  including  2D  coordinates. 


B.6  Basic  Matrix-Vector  Operations  with  the 
Apache  Commons  Math  Library 

It  is  recommended  to  use  proven  standard  software,  such  as  the 
Apache  Commons  Math7  (ACM)  library,  for  any  non-trivial  linear 
algebra  calculation. 

B.6.1  Vectors  and  Matrices 

The  basic  data  structures  for  representing  vectors  and  matrices  are 
RealVector  and  RealMatrix,  respectively.  The  following  ACM  ex¬ 
amples  show  the  conversion  from  and  to  simple  Java  arrays  of  element- 
type  double: 

import  org . apache . commons . math3 . linear . MatrixUtils ; 
import  org . apache . commons . math3 . linear . RealMatrix ; 
import  org . apache . commons . math3 . linear . RealVector ; 

//  Data  given  as  simple  arrays: 
doublet]  xa  =  {1,  2,  3}; 

doublet]  []  Aa  =  {{2,  0,  1},  {0,  2,  0},  {1,  0,  2}}; 

//  Conversion  to  vectors  and  matrices: 

RealVector  x  =  MatrixUtils . createRealVector (xa) ; 

RealMatrix  A  =  MatrixUtils . createRealMatrix (Aa) ; 

//  Get  a  single  matrix  element  A^y 

int  i ,  j  ;  //  specify  row  (i)  and  column  (j) 

double  aij  =  A . getEntry (i ,  j); 

//  Set  a  single  matrix  element  to  a  new  value: 
double  value ; 

A . setEntry (i ,  j,  value); 

//  Extract  data  to  arrays  again: 
doublet]  xb  =  x.toArrayO; 
double  □  □  Ab  =  A.getDataO; 

7  http://commons.apache.org/math/. 
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//  Transpose  the  matrix  A: 
RealMatrix  At  =  A . transpose () ; 


B.6.2  Matrix- Vector  Multiplication 

The  following  examples  show  how  to  implement  the  various  matrix- 
vector  products  described  in  Sec.  B.2.3. 

RealMatrix  A  =  .  .  .  ;  //  matrix  A  of  size  (m,  n) 

RealMatrix  B  =  .  .  .  ;  //  matrix  B  of  size  ( p ,  q),  with  p  —  n 

RealVector  x  =  .  .  .  ;  //  vector  x  of  length  n 

//  Scalar  multiplication  C  s  •  A: 
double  s  =  . . . ; 

RealMatrix  C  =  A . scalarMultiply (s) ; 

//  Product  of  two  matrices:  C  A  •  B: 

RealMatrix  C  =  A  .multiply  (B)  ;  //  C  is  of  size  ( m,q ) 

//  Left-sided  matrix-vector  product:  y  A  •  x\ 

RealVector  y  =  A . operate (x) ; 

//  Right-sided  matrix-vector  product:  y  xT  •  A: 

RealVector  y  =  A.preMultiply (x) ; 


B.6.3  Vector  Products 

The  following  code  segments  show  the  use  of  the  ACM  library  for 
calculating  various  vector  products  described  in  Sec.  B.3. 

RealVector  a,  b;  //vectors  a,  b  (both  of  length  n) 

II  Multiplication  by  a  scalar  c  s  •  a: 
double  s ; 

RealVector  c  =  a.mapMultiply (s) ; 

//  Dot  (scalar)  product  x  <—  a  - b: 
double  x  =  a. dotProduct (b) ; 

//  Outer  product  M  <—  a  ®  b: 

RealMatrix  M  =  a. outerProduct (b) ; 

B.6.4  Inverse  of  a  Square  Matrix 

The  following  example  shows  the  inversion  of  a  square  matrix: 

RealMatrix  A  =  .  .  .  ;  //a  square  matrix 

RealMatrix  Ai  =  MatrixUtils . inverse (A) ; 


B.6.5  Eigenvalues  and  Eigenvectors 

The  following  code  segment  illustrates  the  calculation  of  eigenvalues 
and  eigenvalues  of  a  square  matrix  A  using  the  class  EigenDecompo- 
s  it  ion  of  the  Apache  Commons  Math  API.  Note  that  the  eigenval- 
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ues  returned  by  getRealEigenvalues ()  are  sorted  in  non-increasing  g  7  Solving  Systems  of 
order.  The  same  ordering  applies  to  the  associated  eigenvectors.  Linear  Equations 

import  org . apache . commons . math3 . linear . EigenDecomposition ; 


RealMatrix  A  =  MatrixUtils . createRealMatrix (new  double [] [] 
{{2,  0,  1}, 

{0,  2,  0}, 

{1,  0,  2}}); 

EigenDecomposition  ed  =  new  EigenDecomposition  (A) ; 

if  (ed.hasComplexEigenvalues  () )  { 

System. out .print ln(" A  has  complex  Eigenvalues!"); 

} 

else  { 

//  get  all  real  eigenvalues: 

doublet]  lambda  =  ed . getRealEigenvalues () ;  //  =(3,2,1) 

//  get  the  associated  eigenvectors: 

for  (int  i  =  0;  i  <  lambda . length ;  i++)  { 

RealVector  x  =  ed. getEigenvector (i) ; 


} 


} 


B.7  Solving  Systems  of  Linear  Equations 


This  section  describes  standard  methods  for  solving  systems  of  linear 
equations.  Such  systems  appear  widely  and  frequently  in  all  sorts  of 
engineering  problems.  Identifying  them  and  knowing  about  standard 
solution  methods  is  thus  quite  important  and  may  save  much  time 
in  any  development  process.  In  addition,  the  solution  techniques 
presented  here  are  very  mature  and  numerically  stable.  Note  that 
this  section  is  supposed  to  give  only  a  brief  summary  of  the  topic  and 
practical  implementations  using  the  Apache  Commons  Math  library. 
Further  details  and  the  underlying  theory  can  be  found  in  most  linear 
algebra  textbooks  (e.g.,  [145,190]). 

Systems  of  linear  equations  generally  come  in  the  form 


0 

0 

A),l 

'  '  '  A),n— 1 

Al,o 

^1,1 

‘  •  •  ^l,n-l 

^2,0 

^-2,1 

‘  ‘  ^2,n-l 

•  • 

V^ra— 1,0 

^m— 1,1 

•  • 

^rn  —  1 ,  n  —  1 

\ 

J 


/  •'(>  \ 

x1 


(  K  \ 

h 

^2 

\pm—  1  / 


(B.41) 


or,  in  the  standard  notation, 


A  x  =  b, 


(B.42) 


where  the  (known)  matrix  A  is  of  size  (m,  n),  the  unknown  vector  x 
is  n-dimensional,  and  the  (known)  vector  b  is  m-dimensional.  Thus 
n  corresponds  to  the  number  of  unknowns  and  m  to  the  number 
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of  equations.  Each  row  i  of  the  matrix  A  thus  represents  a  single 
equation 

A,  0‘x0  +  A,  l'xl  +  •  •  •  +  A,n-1  ’xn-l  =  hi  (B.43) 

n— 1 

or  =  L  (b-44) 

j= o 

for  z  =  0, ,  m  — 1.  Depending  on  m  and  n,  the  following  situations 
may  occur: 

•  If  m  =  n  (i.e.,  A  is  square)  the  number  of  unknowns  matches  the 
number  of  equations  and  the  system  typically  (but  not  always, 
of  course)  has  a  unique  solution  (see  Sec.  B.7.1  below). 

•  If  m  <  n,  we  have  more  unknowns  than  equations.  In  this  case 
no  unique  solution  exists  (but  possibly  infinitely  many). 

•  With  m  >  n  the  system  is  said  to  be  over- determined  and  thus 
not  solvable  in  general.  Nevertheless,  this  is  a  frequent  case  that 
is  typically  handled  by  calculating  a  minimum  least  squares  so¬ 
lution  (see  Sec.  B.7.2). 


B.7.1  Exact  Solutions 

If  the  number  of  equations  (m)  is  equal  to  the  number  of  unknowns 
(n)  and  the  resulting  (square)  matrix  A  is  non-singular  and  of  full 
rank  m  =  n,  the  system  A  •  x  =  b  can  be  expected  to  have  a  unique 
solution  for  x.  For  example,  the  system8 

2  •  Xc\  +  3  •  x-\  —  2  •  x<~)  =  1, 

(B.45) 


(B.46) 

The  fol¬ 
lowing  code  segment  shows  how  the  previous  example  is  solved  using 
class  LUDe  compos  it  ion  of  the  ACM  library: 

import  org . apache . . . linear . DecompositionSolver ; 
import  org . apache . . . linear . LUDecomposition ; 

RealMatrix  A  =  MatrixUtils . createRealMatrix (new  double []  [] 

{{  2,  3,  -2}, 

4-1,  7,  6}, 

{  4,  -3,  -5}}) ; 

RealVector  b  =  MatrixUtils . createRealVector (new  doublet] 

{1,  -2,  1}); 

DecompositionSolver  solver  = 

new  LUDecomposition  (A)  .getSolverO  ; 

RealVector  x  =  solver . solve (b) ; 

An  exception  is  thrown  if  the  matrix  A  is  non-square  or  singular. 

8  Example  taken  from  the  Apache  Commons  Math  User  Guide  [4]. 


with 


— Xq  T  7  •  x ^  T  6  •  X2  —  — 2, 
4  •  Xq  —  3  •  x1  —  5  •  x2  =  1, 


x  = 


has  the  unique  solution  x  =  (—0.3698,  0.1780,  — 0.6027)1. 
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B.7.2  Over-Determined  System  (Least-Squares  Solutions)  g  j  Solving  Systems  of 

Tr  .  r  1  •  i  i  Linear  Equations 

Ii  a  system  oi  linear  equations  has  more  equations  than  unknowns 

(i.e.,  m  >  n)  it  is  over-determined  and  thus  has  no  exact  solution.  In 

other  words,  there  is  no  vector  x  that  satisfies  A  •  x  =  b  or 

A  •  x  —  b  =  0.  (B.47) 

Instead,  any  x  plugged  into  Eqn.  (B.47)  yields  some  non-zero  “resid¬ 
ual”  vector  €,  such  that 


A  •  x  —  b  =  e. 


(B.48) 


A  “best”  solution  is  commonly  found  by  minimizing  the  squared  norm 
of  this  residual,  that  is,  by  searching  for  x  such  that 


A  •  x  —  b 


—>  min . 


(B.49) 


Several  matrix  decompositions  can  be  used  for  calculating  the  “least- 
squares  solution”  of  an  over-determined  system  of  linear  equations. 
As  a  simple  example,  we  add  a  fourth  line  (m  =  4)  to  the  system  in 
Eqns.  (B.45)  and  (B.46)  to 


A 


/  2  3-2\ 

-17  6 

4  -3  -5 


x  = 


A 

-2 


(B.50) 


\  2-2-1/  v 2/  \  0/ 

without  changing  the  number  of  unknowns  (n  =  3).  The  least- 
squares  solution  to  this  over-determined  system  is  (approx.)  x  = 
(—0.2339,  0.1157, -0.4942)1.  The  following  code  segment  shows  the 
calculation  using  the  SingularValueDecomposit  ion  class  of  the  ACM 
library: 


import  org . apache . . . linear . DecompositionSolver ; 
import  org . apache . . . linear . SingularValueDecomposition ; 


RealMatrix  A  =  MatrixUtils . createRealMatrix (new  double  []  [] 

{{  2,  3,  -2}, 

4-1,  7,  6}, 

{  4,  -3,  -5}, 

{  2,  -2,  -1}); 

RealVector  b  =  MatrixUtils . createRealVector (new  doublet] 

{1,  -2,  1,  0}); 

DecompositionSolver  solver  = 

new  SingularValueDecomposition  (A)  .getSolverO  ; 
RealVector  x  =  solver . solve (b) ; 

Alternatively,  an  instance  of  QRDe  compos  it  ion  could  be  used  for 
calculating  the  least-squares  solution.  If  an  exact  solution  exists  (see 
Sec.  B.7.1),  it  is  the  same  as  the  least-squares  solution  (with  zero 
residual  e  =  0). 
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This  part  outlines  selected  topics  from  calculus  that  may  serve  as  a 
useful  supplement  to  Chapters  6,  16,  17,  24,  and  25,  in  particular. 


C.1  Parabolic  Fitting 

Given  a  single- variable  (ID),  discrete  function  g:  Z  R,  it  is  some¬ 
times  useful  to  locally  fit  a  quadratic  (parabolic)  function,  for  exam¬ 
ple,  for  precisely  locating  a  maximum  or  minimum  position. 

C.1.1  Fitting  a  Parabolic  Function  to  Three  Sample  Points 


For  a  quadratic  function  (second-order  polynomial) 

y  =  f(x)  =  a  •  x  +  b  •  x  +  c 


(C.l) 


with  parameters  a,  5,  c  to  pass  through  a  given  set  of  three  sample 
points  pi  =  (aq,^),  %  =  1,2,3,  means  that  the  following  three  equa¬ 
tions  must  be  satisfied: 

c\ 

y1  =  a  •  x1  -j-  b  •  x1  +  c, 

7/2  =  ^  *  x2  T  ^  *  T2  T  c,  (C.2) 

ry 

J/3  =  CL  •  X3  ~\~  b  •  X3  -f*  c. 

Written  in  the  standard  matrix  form  A  •  x  =  b,  or 


XT  Xi  1 


ry 

X2  X2  r 


(C.3) 


■3  x3  1 

the  unknown  coefficient  vector  a;  =  (a,  b,  c)T  is  directly  found  as 


X 


A-1  ■  b 


XT  Xi  1 


ry^*  ry 

x2  x2  1 

x3  X3  ± 


(C.4) 


assuming  that  the  matrix  A  has  a  non-zero  determinant.  Geometri¬ 
cally  this  means  that  the  points  pi  must  not  be  collinear. 


©  Spring er-Verlag  London  2016 
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Fig.  C.l 

Fitting  a  quadratic  function  to 
three  arbitrary  sample  points. 


Example: 

Fitting  the  sample  points  p±  =  (— 2,  5) T ,  p2  =  (— 1,6)T,  p3  =  (3, 
— 10) T  to  a  quadratic  function,  the  equation  to  solve  is  (analogous  to 
Eqn.  (C.3)) 


4 

1 

9 


with  the  solution 


1 

20* 


4  -5  1\ 
-8  5  3 

-12  30  2  J 


(C.5) 


Thus  a  =  —  1,  b  =  —  2,  c  =  5,  and  the  equation  of  the  quadratic 
fitting  function  is  y  =  —  x2  —  2x  +  5.  The  result  for  this  example  is 
shown  graphically  in  Fig.  C.l. 


Pi  =  (-2,5)T 
P2  =  (-F  6)T 
p3  =  (3,  — 10)T 


y  =  f  O) 


C.1.2  Locating  Extrema  by  Quadratic  Interpolation 

A  special  situation  is  when  the  given  points  are  positioned  at  x1  = 
—  1,  x2  =  0,  and  x3  =  +1.  This  is  useful,  for  example,  to  esti¬ 
mate  a  continuous  extremum  position  from  successive  discrete  func¬ 
tion  values  defined  on  a  regular  lattice.  Again  the  objective  is  to 
fit  a  quadratic  function  (as  in  Eqn.  (C.l))  to  pass  through  the  points 

Pi  =  (-byi)L  P2  =  (°^2)t,  and  p3  =  (1  ,y3)J.  In  this  case,  the 
simultaneous  equations  in  Eqn.  (C.2)  simplify  to 


y\  =  a  ~  b  +  c, 

2/2  =  C 

2/3  =  cl  -j-  b  +  c, 

with  the  solution 

2/i  —  2  •  2/2  +  2/3  r  2/3  2/i 

a=  - x - j  b= - x — ~i  C  =  V2 • 


(C.6) 


(C.7) 


To  estimate  a  local  extremum  position,  we  take  the  first  derivative 
of  the  quadratic  fitting  function  (Eqn.  (C.l)),  which  is  the  linear 
function  f'(x)  =  2a  •  x  +  6,  and  find  the  position  x  of  its  (single)  root 
by  solving 
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2a  •  x  -j-  b  =  0. 


(C.8) 


With  a,  b  taken  from  Eqn.  (C.7),  the  extremal  position  is  thus  found 
as 


x 


Vi  ~  Vs 


2a  2  •  (y1  —  2 y2  +  2/3) 


(C.9) 


The  corresponding  extremal  value  can  then  be  found  by  evaluating 
the  quadratic  function  /()  at  position  x,  that  is, 


y  =  f(x)  =  a  •  x2  +  b  •  x  +  c, 


(C.10) 


with  a,  6,  c  as  defined  in  Eqn.  (C.7).  Figure  C.2  shows  an  example 
with  sample  points  p±  =  (— 1,  —  2)T,  p2  =  (0,7)T,  p3  =  (1,6)T.  In 
this  case,  the  interpolated  maximum  position  is  at  x  =  0.4  and  the 
corresponding  maximum  value  is  /(x)  =  7.8. 


Pi  =  (-b-2)T 
p2  =  (0,7)t 
P3  =  (b6)T 


v  =  /  O) 


Using  the  above  scheme,  we  can  interpolate  any  triplet  of  suc¬ 
cessive  sample  values  centered  around  some  position  uGZ,  that  is, 

Pi  =  (u-!>l/i)T,  P2  =  (u>J/2)T>  P3  =  0+72/3)T>  with  arbitrary 
values  2/1, 2/2?  2/3-  In  this  case  the  estimated  position  of  the  extremum 
is  simply  (from  Eqn.  (C.9)) 


x  =  u  + 


y\  -  y 3 


2  •  (2/1  —  2  •  y2  +  2/3) 


(C.ll) 


The  application  of  quadratic  interpolation  to  multi- variable  functions 
is  described  in  Sec.  C.3.3. 


C.2  Scalar  and  Vector  Fields 


An  RGB  color  image  I(u,v)  =  (Ir(u,  x),  Ig(u,  v),  Ib(u,  v))  can  be 
considered  a  2D  function  whose  values  are  3D  vectors.  Mathemati¬ 
cally,  this  is  a  special  case  of  a  vector- valued  function  /:  Mn  t-^  Mm, 


/  fo(x)  \ 

f(x)  =  f(x  o,...,xn_1)=  : 

\fm-l(x)J 


(C.12) 


which  is  composed  of  m  scalar- valued  functions  fi :  Mn  t-^  R,  each 
being  defined  on  the  domain  of  n-dimensional  vectors. 

A  multi- variable,  scalar- valued  function  /:  Mn  1— >>  R  is  called  a 
scalar  field ,  while  a  vector- valued  function  / :  Mn  t-^  Mm  is  referred 
to  as  a  vector  field. 


C.2  Scalar  and  Vector 
Fields 


Fig.  C.2 

Fitting  a  quadratic  function  to 
three  reference  points  at  posi¬ 
tions  x1  =  —l,x2  =  0,  x 3  = 

+  1.  The  interpolated,  contin¬ 
uous  curve  has  a  maximum  at 
the  continuous  position  x  —  0.4 
(large  circle). 
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C.2.1  The  Jacobian  Matrix 


Assuming  that  the  function  /(sc)  =  (/0(cc), . . . ,  is  differen¬ 

tiable,  the  so-called  functional  or  Jacobian  matrix  at  a  specific  point 
x  =  (x0, . . . ,  xn_i)  is  defined  as 


d 


3f(x)  = 


\£-Jm-l(x)  ■■■ 


dx 


d 


n — 1 


/o(U  \ 


(C.13) 


The  Jacobian  matrix  is  of  size  mxn  and  composed  of  the  first  deriva¬ 
tives  of  the  m  component  functions  /0, . . . ,  /m_i  with  respect  to  each 
of  the  n  independent  variables  x0, . . . ,  xn_1.  Thus  each  of  its  elements 
t quantifies  how  much  the  value  of  the  scalar- valued  compo¬ 
nent  function  fi(x)  =  f^x 0, . . . ,  xn_1)  changes  when  only  variable  Xj 
is  varied  and  all  other  variables  remain  fixed.  Note  that  the  matrix 
J. f(x)  is  not  constant  for  a  given  function  /  but  is  different  at  each 
position  x.  In  general,  the  Jacobian  matrix  is  neither  square  (unless 
m  =  n)  nor  symmetric. 


C.2.2  Gradients 


Gradient  of  a  scalar  field 

The  gradient  of  a  scalar  field  /:  Mn  R,  with  f(x)  =  f(xQ, . . . , 
xn_i),  at  a  given  position  x  G  Mn  is  defined  as 


(V/)(A)  =  (grad/)  (x) 


(C.14) 


The  resulting  vector- valued  function  quantifies  the  amount  of  output 
change  with  respect  to  changing  any  of  the  input  variables  x0, . . . ,  xn^1 
at  position  x.  Thus  the  gradient  of  a  scalar  field  is  a  vector  field. 

The  directional  gradient  of  a  scalar  field  describes  how  the  (scalar) 
function  value  changes  when  the  coordinates  are  modified  along  a 
particular  direction,  specified  by  the  unit  vector  e.  We  denote  the 
directional  gradient  as  Ve/  and  define 


(Ve/)(*)  =  (V/)(i)-e,  (C.15) 

where  •  is  the  scalar  product  (see  Sec.  B.3.1).  The  result  is  a  scalar 
value  that  can  be  interpreted  as  the  slope  of  the  tangent  on  the  n- 
dimensional  surface  of  the  scalar  field  at  position  x  along  the  direction 
specified  by  the  n-dimensional  unit  vector  e  =  (e0, . . . ,  en_1)T . 


Gradient  of  a  vector  field 

To  calculate  the  gradient  of  a  vector  field  f  :  Mn  Mm,  we  note 
that  each  row  i  in  the  mxn  Jacobian  matrix  J f  (Eqn.  (C.13))  is  the 
transposed  gradient  vector  of  the  corresponding  component  function 
/^,  that  is, 
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(v/oK*_)T_  \ 


C.2  Scalar  and  Vector 
(C.16)  FlELDS 


and  thus  the  Jacobian  matrix  is  equivalent  to  the  gradient  of  the 
vector  field  /, 


(grad  f)(x)=3f(x).  (C.17) 

Analogous  to  Eqn.  (C.15),  the  directional  gradient  of  the  vector  field 
is  then  defined  as 


(grade  /)(*)  =  J/ (*)  '  ei  (C.18) 

where  e  is  again  a  unit  vector  specifying  the  gradient  direction  and  • 
is  the  ordinary  matrix- vector  product.  In  this  case  the  resulting  gra¬ 
dient  is  a  m-dimensional  vector  with  one  element  for  each  component 
function  in  /. 


C.2. 3  Maximum  Gradient  Direction 


In  case  of  a  scalar  field  /(x),  a  resulting  non-zero  gradient  vector 
(V/)(T)  (Eqn.  (C.14))  is  also  the  direction  of  the  steepest  ascent  of 
f(x )  at  position  x }  In  this  case,  the  L2  norm  (see  Sec.  B.1.2)  of  the 
gradient  vector,  that  is,  ||(V/)(cb)||,  corresponds  to  the  maximum 
slope  of  /  at  point  x. 

In  case  of  a  vector  field  /(cc),  the  direction  of  maximum  slope 
cannot  be  obtained  directly,  since  the  gradient  is  not  a  n-dimensional 
vector  but  its  m  x  n  Jacobian  matrix.  In  this  case,  the  direction  of 
maximum  change  in  the  function  f  is  found  as  the  eigenvector  xk  of 
the  square  (n  x  n)  matrix 


M  =  J}(x)-J f(x) 


(C.19) 


that  corresponds  to  its  largest  eigenvalue  Xk  (see  also  Sec.  B.4). 


C.2. 4  Divergence  of  a  Vector  Field 

If  the  vector  field  maps  to  the  same  vector  space  (i.e.,  /:  Mn  ha  Mn), 
its  divergence  (div)  is  defined  as 

(div/)(*)  =  £-Jo(x)  +  •  •  •  +  o^rjn- 1(*) 

n— 1 

2  =  0 

for  a  given  point  x.  The  result  is  a  scalar  value  and  thus  (di vf)(x) 
yields  a  scalar  field  Mn  rA  R.  Note  that,  in  this  case,  the  Jacobian 
matrix  J f  in  Eqn.  (C.13)  is  square  (of  size  n  x  n)  and  div/  is  equiv¬ 
alent  to  the  trace  of  Jj,  that  is, 

(div/)(cb)  =  trace(Jj(cc)).  (C.22) 


(C.20) 

(C.21) 


1  If  the  gradient  vector  is  zero ,  that  is,  if  (V/)(a?)  =  0,  the  direction  of 
the  gradient  is  undefined  at  position  x. 
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C.2.5  Laplacian  Operator 

The  Laplacian  (or  Laplace  operator)  of  a  scalar  field  / :  Mn  ha  R  is  a 
linear  differential  operator,  commonly  denoted  A  or  V2.  The  result  of 
applying  V2  to  the  scalar  field  /:  Mn  ha  R  generates  another  scalar 
field  that  consists  of  the  sum  of  all  unmixed  second-order  partial 
derivatives  of  /  (if  existent),  that  is, 

n— 1 

(V2/)(U  =  £if(±)  +  •  •  •  +  sS—fix)  =  Y,  &/(*)•  (c-23) 

U  n  —  1  i 

z=0 

The  result  is  a  scalar  value  that  is  equivalent  to  the  divergence  (see 
Eqn.  (C.21))  of  the  gradient  (see  Eqn.  (C.14))  of  the  scalar  field  /, 
that  is, 


(V2/)(U  =  (divV/)s(*).  (C.24) 


The  Laplacian  is  also  found  as  the  trace  of  the  function’s  Hessian 
matrix  Hj  (see  Sec.  C.2.6). 

For  a  vector- valued  function  f :  Mn  ha  Mm,  the  Laplacian  at  point 
x  is  again  a  vector  field  Mn  ha  Mm, 


(v2/)A) 


/  (V2/0)(i)  \ 

(v2/2)(U 


G  Rm, 


V(v2/m_i)A)/ 


(C.25) 


that  is  obtained  by  applying  the  Laplacian  to  the  individual  (scalar¬ 
valued)  component  functions. 


C.2.6  The  Hessian  Matrix 


The  Hessian  matrix  of  a  n- variable,  real- valued  function  /:  Mn  ha 
R  is  the  n  x  n  square  matrix  composed  of  its  second-order  partial 
derivatives  (assuming  they  all  exist),  that  is, 


(Hop 

H 


H 


/ 


1,0 


Hop 

Hr  i 


Hq  ,  n— 1  ^ 
H-\  5  77, — 1 


v^n-i ,  o  Hn_  1 5 1  Hn—i  5  n_i  y 

<92 .  f  _ £ _ f  ...  _ d! _ A 

./  r)nr  Onr-  J  r)nr  r)nr  J  \ 


5a^ 

a2 


5a;x  5a;0 


/ 


5a;0  53^ 

a2 


5aq 


/ 


<9:e0  dxn_1 
d 2 

5a;  2  5ajn_x 


/ 


5^ 


/ 


9' 


\  5a;  .  5a;n  ^  5a;  .  5a;i 

'  71—1  U  71—1  1 


/ 


9' 


5a; 


77, — 1 


f  J 


(C.26) 


(C.27) 


Since  the  order  of  differentiation  does  not  matter  (i.e.,  Hi  j  =  Hj  f), 
H/  is  symmetric.  Note  that  the  Hessian  is  a  matrix  of  functions.  To 
evaluate  the  Hessian  at  a  particular  point  x  E  Mn,  we  write 
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5£t/(*) 


d ' 


H  f(x)  = 


dx^ 


dxQ  dxn_1 


f(x)\ 


d ' 


\  d^n_i  dx0 


/(*) 


d ' 


(C.28) 


dx‘ 


-/(*)  J 


which  is  a  scalar- valued  matrix  of  size  n  x  n.  As  mentioned  already, 
the  trace  of  the  Hessian  matrix  is  the  Laplacian  V2  of  the  function 
/,  that  is, 


n— 1 

V2/  =  trace  (H;)  U-29) 

‘  ^  i 

i= 0 


Example 

Given  a  2D,  continuous,  grayscale  image  or  scalar- valued  intensity 
function  I{x,y),  the  corresponding  Hessian  matrix  (of  size  2x2) 
contains  all  second  derivatives  along  the  coordinates  x,y,  that  is, 


H 


i 


/  d‘ 


I 


d ‘ 


dx2 

1  d2  j 

\  dydx 


dxdy 
d 2 


I 


dy: 


I 


(C.30) 


The  elements  of  H j  are  2D,  scalar- valued  functions  over  x,  y  and  thus 
scalar  fields  again.  Evaluating  the  Hessian  matrix  at  a  particular 
point  x  yields  the  values  of  the  second  partial  derivatives  of  I  at  this 
position, 


H/(±)  = 


that  is,  a  matrix  with  scalar-valued  elements. 


(C.31) 


C.3  Operations  on  Multi-Variable,  Scalar  Functions 
(Scalar  Fields) 


C.3.1  Estimating  the  Derivatives  of  a  Discrete  Function 


Images  are  typically  discrete  functions  (i.e. ,  I :  N2  R)  and  thus 
not  differentiable.  The  derivatives  can  nevertheless  be  estimated  by 
calculating  finite  differences  from  the  pixel  values  in  a  3  x  3  neigh¬ 
borhood,  which  can  be  expressed  as  a  linear  filter  or  convolution 
operation  (*).  In  particular,  the  first- order  derivatives  Ix  =  dl/dx 
and  Iy  =  dl/dy  are  usually  estimated  in  the  form 


I x  «  I  * 


-0.5  0  0.5], 


the  second-order  derivatives  Lr  r 


iy  ^  i  * 


-0.5 

0 

0.5 


(C.32) 


d2I/dx 2  and  Iyy  =  d2I/dy 2  as 


iy.  i  ^ 


1  -2  1] 


«  I  * 


1 

-2 

1 


(C.33) 
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and  the  mixed  derivative 


d2I 

dxdy 


I 


xy 


I 


yx 


/*  [—0.5  0  0.5]* 


—0.5" 

"  0.25 

0 

-0.25" 

0 

=  /* 

0 

0 

0 

0.5 

-0.25 

0 

0.25 

(C.34) 


C.3.2  Taylor  Series  Expansion  of  Functions 
Single-variable  functions 

The  Taylor  series  expansion  (of  degree  d)  of  a  single- variable  function 
/:  R  K  M  about  a  reference  point  a  is 

f(x)  =  f(a)  +  f(a)  •  ( x-a )  +  f"(a)  ■  ^  ^  H - 

...  +  f(d){a).i^y+Rd  (C.35) 

=  f(a)  +  CC)  ‘  " — rjd-  +  Rd  (C.36) 

i ! 

i— 1 

=  £/<*>(<» (C.37) 

i ! 

2  =  0 

where  Rd  is  the  residual  term.2  This  means  that  if  the  value  /(a)  and 
the  first  d  derivatives  /'(a),  /"(a), . . . ,  f^d\a)  exist  and  are  known  at 
some  position  a,  the  value  of  /  at  another  point  x  can  be  estimated 
(up  to  the  residual  Rd)  only  from  the  values  at  point  a,  without 
actually  evaluating  f(x).  Omitting  the  remainder  Rd ,  the  result  is 
an  approximation  for  /(±),  that  is, 


d 

f(x)  «  y]/w(a)  • 
2  =  0 


whose  accuracy  depends  upon  d  and  the  distance  x  —  a. 


(C.38) 


Multi- variable  functions 

In  general,  for  a  real-valued  function  of  n  variables, 

f(x)  =  f(x0,  X2,...,  xn_i)  £  M, 

the  full  Taylor  series  expansion  about  a  reference  point  a  =  (a0, 
a„_i)T  is 


f(x0,...,xn=1)  =  f  (a)  + 


OO 


CO 


£-£[f^ 

2  -1  2  -1  9X0 
l0  —  1  ln-l  — 1  U 

_°£  °°  O20 

i1=0  in  =0  U'L0 


ppn—l 

diti 


]/(«) 


(C.39) 

(xo-aoYo  •  •  •  (av-i-an-A""1 


L  •  L* 


_^ALl  f(  )  Uo  — «oU  •  •  •  (Vi-Vi)1-1 

C-w,,  i- 
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2  Note  that  /(-°-)  =  /,  /(-1-)  =  f' ,  f  ^  =  f"  etc.,  and  1!  =  1. 


In  Eqn.  (C.39),3  the  term 


-  dl° 

dxlQ 


(C.40) 


is  the  value  of  the  function  /,  after  applying  a  sequence  of  n  par¬ 
tial  derivatives,  at  the  n-dimensional  position  a.  The  operator  ~- 
denotes  the  i-th  partial  derivative  on  the  variable  xk. 

To  formulate  Eqn.  (C.39)  in  a  more  compact  fashion,  we  define 
the  index  vector 


^  (^0 5  •  •  •  i  ^n— l)?  (^*41) 

(with  ik  G  N0  and  thus  i  G  Ng),  and  the  associated  operations 

i!  %q\  ’  i]  \  •  . . .  ’  zn_^!, 

xx  =  oq°  •  x%2  •  .  .  •  •  xl£z{ ,  (C.42) 

Li  =  Zq  T  i\  T  .  • .  T  in— i- 


As  a  shorthand  notation  for  the  combined  partial  derivative  operator 
in  Eqn.  (C.40)  we  define 

0  d^n—  1  £^*0  "Dl  “F  •  1 

D2  1=  - : - —  •  •  •  - ; -  =  - ; - - - : - . 

dxlQ  dx l1  dx^Zi  dxlQ  dx ^  •  •  •  dx™z{ 

With  these  definitions,  the  full  Taylor  expansion  of  a  multi-variable 
function  about  a  point  a,  as  given  in  Eqn.  (C.39),  can  be  elegantly 
written  in  the  form 


(C.43) 


*>  =  E  D‘/(a)  ■  {CM) 

Note  that  D 2/  is  again  a  n-dimensional  function  Mn  R,  and  thus 
D2/]  (a)  in  Eqn.  (C.44)  is  the  scalar  quantity  obtained  by  evaluating 

the  function  [D2/]  at  the  n-dimensional  point  a. 

To  obtain  a  Taylor  approximation  of  order  d,  the  sum  of  the 
indices  il5 . . . ,  in  is  limited  to  d,  that  is,  the  summation  is  constrained 
to  index  vectors  i,  with  Li  <  d.  The  resulting  formulation, 

/(*)  ~  E  D*/(a)  •  ^—7 7-b  (c-45) 

Li<d 

is  obviously  analogous  to  the  ID  case  in  Eqn.  (C.38). 

Example:  two- variable  (2D)  function 

This  example  demonstrates  the  second-order  (d  =  2)  Taylor  expan¬ 
sion  of  a  2D  (n  =  2)  function  / :  M2  R  around  a  point  a  =  (xa,  ya). 
By  inserting  into  Eqn.  (C.44),  we  get 

Q 

Note  that  symbols  a;0, . . . ,  denote  the  individual  variables,  while 
x0,...,xn_L  are  the  coordinates  of  a  specific  point  in  n-dimensional 
space. 
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f(x,y)  «  ^  D7Ua,2/a)  '  Tj  •  ^ 


Li<2 

E 

0<i,j<2 

(i+j)<2 


V~Va 


(C.46) 


f  (V  „  'l  •  (X~X*y  '  ( y  ~  Vo) 

J  v^a:  tJa) 


3 


dxl  dyi 


il-jl 


(C.47) 


Since  d  =  2,  the  six  permissible  index  vectors  i  =  (i,  j),  with  Li  <  2, 
are  (0,0),  (1,0),  (0,1),  (1,1),  (2,0),  and  (0,2).  Inserting  into  Eqn. 
(C.47),  we  obtain  the  corresponding  Taylor  approximationat  position 
(x,  y)  as 


?(^,  y) 


~  5 

rsj  ~ — — 


,o 


+  — ^ 
'  dx 1  c 

+  — ^ 
'  5a:0  c 

+  — ^ 
'  5a:1  c 

+  _jE 

'  5a: 2  ( 

+  — ^ 
'  5a:0  ( 

fUa,l/a) 


(x-xa)° 

•  ( y-Vaf 

1 

•  1 

(x-xj1 

•  ( y-ya)° 

1 

•  1 

(x  —  Xa)° 

•  {y-ya)1 

1 

•  1 

(x-xj1 

•  {y-ya)1 

1 

•  1 

{x  —  Xa)2 

•  {y-ya)° 

2 

•  1 

(x  —  Xa)° 

•  {y-ya)2 

1 

•  2 

(C.48) 


(C.49) 


+  lkf(Xa,ya)-( 


m;f(xa,ya)-(y-ya ) 


x~xa)-{y- 

1  9 

^-^a)2  +  2  ■■§pf(xa,ya)-{y- 


■Va) 

1  2 


VaX- 


It  is  assumed  that  the  required  derivatives  of  /  exist,  that  is,  /  is 
differentiable  at  point  ( xa,ya )  with  respect  to  x  and  y  up  to  the 
second  order.  By  slightly  rearranging  Eqn.  (C.49)  to 


f{x,  y )  «  f{xa,  ya)  +  -§zf(xa,  ya)  •  (: x-xa )  +  jfaf{xa,  ya)  ■  (y-ya) 


d 


1 

+  2  L 


■§^f(xa,ya)-(x-xa)2  +  2--£-^f(xa,ya)-(x-xa)-(y-ya) 

+  f(xa,ya)-(y-ya f  (C.50) 

we  can  now  write  the  Taylor  expansion  in  matrix- vector  notation  as 


/ (x,  y)  ~  /( X,  y)  =  f(xa,  ya)  +  (  7%f(xa,  ya),  ^  f{xa ,  ya) 


d 


1 

+  2  L 


(  £zf{xa,ya)  g^f{Xa,ya)  X 
\Sryf(xa,ya)  &/(*«,*„)  y 


(x-xa,y-ya)  •  g2 


X 

4/ 


a 


a 


(C.51) 


or,  even  more  compactly,  in  the  form 
f(x)  =  /(a)  -j-  Vj (a) -(x  —  a)  +  ^  •  (cc  —  a)T  •  H f(a)  •  (®  — a).  (C.52) 
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Here  Vj(a)  denotes  the  (transposed)  gradient  vector  of  the  function 
/  at  point  a  (see  Sec.  C.2.2),  and  is  the  2x2  Hessian  matrix  of 
/  (see  Sec.  C.2.6), 


H  f(a) 


(H00  H01\ 
\HW  Hn) 


(C.53) 
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Fields) 


If  the  function  /  is  discrete ,  for  example,  a  scalar- valued  image  /,  the 
required  partial  derivatives  at  some  lattice  point  a  =  (ua,va)T  can 
be  estimated  from  its  3x3  neighborhood,  as  described  in  Sec.  C.3.1. 


Example:  three- variable  (3D)  function 

For  a  3D  function  /:  M3  M,  the  second-order  Taylor  expansion 
(d  =  2)  is  analogous  to  Eqns.  (C.51-C.52)  for  the  2D  case,  except 
that  now  the  positions  x  =  (x,?/,z)T  and  a  =  {xa,ya,  za)J  are  3D 
vectors.  The  associated  (transposed)  gradient  vector  is 

V/(a)  =  (£f(a),£f(a),&f(a)),  (C.54) 

and  the  Hessian,  composed  of  all  second-order  partial  derivatives,  is 
the  3x3  matrix 


H  f(a) 


af aV(a) 


r  °  v 


oy* 


(C.55) 


Note  that  the  order  of  differentiation  is  not  relevant  since,  for  exam- 

o2  o2 

pie,  ~ °  ~  —  r, ° r,  ,  and  therefore  Hf  is  always  symmetric. 

1  ’  ox  oy  oy  ox1  I  j  j 

This  can  be  easily  generalized  to  the  n-dimensional  case,  though 
things  become  considerably  more  involved  for  Taylor  expansions  of 
higher  orders  (d  >  2). 


C.3. 3  Finding  the  Continuous  Extremum  of  a  Multi- 
Variable  Discrete  Function 

In  Sec.  C.1.2  we  described  how  the  position  of  a  local  extremum 
can  be  determined  by  fitting  a  quadratic  function  to  the  neighboring 
samples  of  a  ID  function.  This  section  shows  how  this  technique  can 
be  extended  to  n-dimensional,  scalar-valued  functions  /  :  Mn  M. 

Without  loss  of  generality  we  can  assume  that  the  Taylor  expan¬ 
sion  of  the  function  f(x)  is  carried  out  around  the  point  a  =  0  =  (0, 
. . . ,  0),  which  clearly  simplifies  the  remaining  formulation.  The  Tay¬ 
lor  approximation  function  (see  Eqn.  (C.52))  for  this  point  can  be 
written  as 

f(x)  =  /( 0)  +  V/( 0)  •  *  +  |-*T  •  uf(0)  ■  X,  (C.56) 

with  the  gradient  Vj  and  the  Hessian  matrix  Hy  evaluated  at  position 
0.  The  vector  of  the  first  derivative  of  this  function  is 


/'(*)  =  V/(0)  +  I-  [(U  •  Hf  (0))T  +  Hf (0)  •  x] 


(C.57) 
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Appendix  C  Since  (xT-H^)T  =  (Hj-cc)  and  because  the  Hessian  matrix  Hy  is 
Calculus  symmetric  (i.e.,  Hj  =  Hj),  this  simplifies  to 

f\x)  =  V/(0)  +  \  •  (Hj(0)  •  x  +  Hj(0)  •  ®)  (C.58) 

=  V/(0)  +  1^(0)  -x.  (C.59) 

A  local  maximum  or  minimum  is  found  where  all  first  derivatives  f' 

are  zero,  so  we  need  to  solve 

V^O)  +  Hf(0)-&  =  0,  (C.60) 

for  the  unknown  position  x.  By  multiplying  both  sides  with  H^1 
(assuming  that  the  inverse  of  Hy  (0)  exists),  the  solution  is 

x  =  -H/-1(0)-V/(0),  (C.61) 

for  the  specific  expansion  point  a  =  0  (Eqn.  (C.63)).  Analogously, 
for  an  arbitrary  expansion  point  a,  the  extremum  position  is 

x  =  a-  H ~\a)  ■  Vf{a).  (C.62) 

Note  that  the  inverse  Hessian  matrix  H^T1  is  again  symmetric. 

The  estimated  extremal  value  of  the  approximation  function  /  is 
found  by  replacing  x  in  Eqn.  (C.56)  with  the  extremal  position  x 
(calculated  in  Eqn.  (C.61))  as 

/ext™  =  /(*)  =  /( 0)  +  V/(0)  •  X  +  i  •  XJ  ■  Hj-(O)  •  * 

=  /( 0)  +  V/( 0)  •  ®  +  i  •  ®T  •  11,(0)  •  (— H^1(0))  •  V,( 0) 

=  /( 0)  +  V/(0)  •  X  -  ±  •  X1  ■  1  •  V/( 0)  (C.63) 

=  /(0)  +  Vj(0)  •  ac  —  |  •  V/(0)  •  x 
=  /(0)  +  y  Vj(0)  •  ac, 

again  for  the  expansion  point  a  =  0. 

/ext™  =  /(«)  =  /(a)  +  k  V/(a)  •  (sc  -  a) .  (C.64) 

Note  that  /extrm  may  be  a  local  minimum  or  maximum,  but  could 
also  be  a  saddle  point  where  the  first  derivatives  of  the  function  are 
zero  as  well. 

Local  extrema  in  2D 

The  aforementioned  scheme  can  be  applied  to  n-dimensional  func¬ 
tions.  In  the  special  case  of  a  2D  function  /:  M2  hA  R  (e.g.,  a  2D 
image),  the  gradient  vector  and  the  Hessian  matrix  for  the  given 
expansion  point  a  =  (xa,ya)T  can  be  noted  as 

W  =  (a)  “d  H'(o1  =  tl)  •  (C  65) 

for  a  given  expansion  point  a  =  (xa,  ya)T .  In  this  case,  the  inverse  of 
the  Hessian  matrix  is 


1 


H/_1 


H2 
11 01 


~H  n 
#01 


#01 
#oo 


tt  rr  \  H  ~H  I  ^ C-66 ) 

^00  '  ^11  \  ^oi  nooJ 
and  the  resulting  position  of  the  extremal  point  is  (see  Eqn.  (C.62)) 


1 

^-#00 -#n 
1 

Hl-H00  •  Hn 


-#  11  #01 
^01  ~^oo/ 

Hm  •  cL  -  H 


H 


01 

01 


y 

dx  —  Ht 


n 

oo 


(C.67) 

(C.68) 
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The  extremal  position  is  only  defined  if  the  denominator  in  Eqn. 
(C.68),  Hq l  —  H00-Hn  (equivalent  to  the  determinant  of  Hj),  is  non¬ 
zero,  indicating  that  the  Hessian  matrix  H f  is  non-singular  and  thus 
has  an  inverse.  The  associated  value  of  /  at  the  estimated  extremal 
position  x  =  (F,i/)T  can  be  now  calculated  using  Eqn.  (C.64)  as 


f(x,  y)  =  f{xa,  ya)  +  \  •  ( dx ,  dy)  • 

,,  x  ,  dx-(x-xa ) 

=  f{xa,ya)  + - 

Numeric  2D  example 

The  following  example  shows  how  a  local  extremum  can  be  found 
in  a  discrete  2D  image  with  sub-pixel  accuracy  using  a  second-order 
Taylor  approximation.  Assume  we  are  given  a  grayscale  image  /:  Zx 
with  the  sample  values 


y-Va 


7  /  W  \ 


(C.69) 


ua~ 1  Ua  Ua  +  1 


8 

11 

7 

Va 

15 

16 

9 

14 

12 

10 

(C.70) 


in  the  3x3  neighborhood  of  position  a  =  (ua,va)T.  Obviously,  the 
discrete  center  value  /(a )  =  16  is  a  local  maximum  but  (as  we  shall 
see)  the  maximum  of  the  continuous  approximation  function  is  not 
at  the  center.  The  gradient  vector  V/  and  the  Hessian  Matrix  Hj  at 
the  expansion  point  a  are  calculated  from  local  finite  differences  (see 
Sec.  C.3.1)  as 

v'<o)  =  G)  =  °5(i2-n)  =  (o;0  “d  <C71) 

,  ,  _  / Hn  H12\  _  /  9-2-16+15  0.25-(8  — 14  — 7+10)\ 

tiya)  -  yHi2  H22j  -  ^o.25-(8-i4-7+io)  11-2-16+12  ) 

-8.00  -0.75\ 

-0.75  -9.00)  ’ 


(C.72) 


respectively.  The  resulting  second-order  Taylor  expansion  about  the 
point  a  is  the  continuous  function  (see  Eqn.  (C.52)) 


f(x )  =  /(a)  +  Vj(a)  •  (x  —  a)  +  \  •  (x  —  a)T  •  H 7(a)  •  (x  —  a) 
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Appendix  C  We  use  the  inverse  of  the  2x2  Hessian  matrix  at  position  a  (see  Eqn. 
Calculus  (C.66)), 


HZ1^) 


-8.00  —0.75 
-0.75  -9.00 


-0.125984  0.010499\  (  , 

0.010499  -0.111986/  ’  ^'74' 


to  calculate  the  position  of  the  local  extremum  x  (see  Eqn.  (C.68))  as 


x  —  a 


H1 


u 

V 


a 


a 


i  (a)-Vj(a) 

—0.125984 
0.010499 


(C.75) 


0.010499 

0.111986 


ua  —  0.3832\ 
ua  +  0.0875 J 


Finally,  the  extremal  value  (see  Eqn.  (C.64))  is  found  as 


fix )  =  /(a)  +  \  ■  v/(a)  •  (x  -  a) 

=  16+1-  (-3, 0.5)  •  (Ua  ~  "  Ua 

2  \va  +  0.0875  -  va 

=  16  +  \  •  (3  •  0.3832  +  0.5  •  0.0875)  =  16.5967 . 


(C.76) 


Figure  (C.3)  illustrates  the  aforementioned  example,  with  the  expan¬ 
sion  point  set  to  a  =  ( ua,va)J  =  (0,0)T. 


Fig.  C.3 

Continuous  Taylor  approxi¬ 
mation  of  a  discrete  2D  image 
function  for  determining  the 
local  extremum  position  with 
sub-pixel  accuracy.  The  cubes 
represent  the  discrete  image 
samples  in  a  3  X  3  neighbor¬ 
hood  around  the  reference 
coordinate  (0,0),  which  is  a 
local  maximum  of  the  dis¬ 
crete  image  function  (see  Eqn. 
(C.70)  for  the  concrete  val¬ 
ues).  The  parabolic  surface 
shows  the  continuous  approx¬ 
imation  f(x,y )  obtained  by 
second-order  Taylor  expan¬ 
sion  about  the  center  position 
a  =  (0,0).  The  vertical  line 
marks  the  position  of  the  lo¬ 
cal  maximum  /(sc)  =  16.5967 
at  x  =  (-0.3832,0.0875). 


Local  extrema  in  3D 

In  the  case  of  a  three- variable,  scalar  function  /:  M3  ^  R,  with  a 
given  expansion  point  a  =  (xa,  ya,  za)T  and 


(  ^x\ 

dy  and  H  f(a) 

w 


(Hm  Hoi 

ffoi 

\Ho2  H12 


(C.77) 
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being  the  gradient  vector  and  the  Hessian  matrix  of  /  at  point  a, 
respectively,  the  estimated  extremal  position  is 


x  =  (x,  y ,  z)T  =  a  —  H 1(a)  •  Vj(a) 


(C.78) 


1 


Hq2  '  H±1  ~\~Hq1  •  H22~\~  Hqq  '  2  Hqq  •  H11  '  H  22  Hq1  •  Hq2  '  H-^2 
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I  Hq1-H22—Hq2-H12 
\Hq2-H11-H01-H12 


Hq\  'H22~Hq2H12 
Hq2—H0Q-H22 
Hqq-H12  —  H01  'Hq2 


HQ2-H11~H01-H 
Hqq  '  H±  2  Hq  i  •  H 
^o2i-^oo-^n 


Note  that  the  inverse  of  the  3x3  Hessian  matrix  1  is  again 
symmetric  and  can  be  calculated  in  closed  form  (as  shown  in  Eqn. 
(C.78)).4 

Again  using  Eqn.  (C.64),  the  estimated  extremal  value  at  position 
x  =  (x,  t/,  z)T  is  found  as 

f(x)  =  / (a)  +  V/(a)  •  (x  -  a) 

t,  ,  ,  dx-(x-xa)  +  d  -(y-ya)  +  dz-(z-za) 

=  /(A  + - — £ - 


(C.79) 

(C.80) 


4 


Nevertheless,  the  use  of  standard  numerical  methods  is  recommended. 
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Appendix  D 


Statistical  Prerequisites 


This  part  summarizes  some  essential  statistical  concepts  for  vector¬ 
valued  data,  intended  as  a  supplement  particularly  to  Chapters  11 
and  IT. 


D.1  Mean,  Variance,  and  Covariance 

For  the  following  definitions  we  assume  a  sequence  X  =  (x0,  x\->  •  •  •  > 
xn_i)  of  n  vector- valued,  m-dimensional  measurements,  with  “sam¬ 
ples” 

xi  (,xi, 05  xi,  1?  •  •  •  5  xi,m— l)  C  M.  .  (D.l) 

D.1.1  Mean 

The  n-dimensional  sample  mean  vector  is  defined  as 

(M 07  Mi?  •  •  •  5  Mm— l)  (D.2) 

1  1  n_1 

==  —  *  (ccq  T  -(“•••  H-  —  •  N  ^  ctq .  (D.3) 

n  n 

i—0 

Geometrically  speaking,  the  vector  n(X)  corresponds  to  the  centroid 
of  the  sample  vectors  xi  in  m-dimensional  space.  Each  scalar  element 
fip  is  the  mean  of  the  associated  component  (also  called  variate  or 
dimension)  p  over  all  n  samples,  that  is 

^  71—1 

Mp  —  —  •  ^  p  ,  (D.4) 


for  p  =  0, . . . ,  m—  1. 


D.1.2  Variance  and  Covariance 

The  covariance  quantifies  the  strength  of  interaction  between  a  pair 
of  components  p,  q  in  the  sample  X,  defined  as 
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For  efficient  calculation,  this  expression  can  be  rewritten  in  the  form 


^  v  (Xi,P  ftp)  *  (Xi,q  ftq)’  (D.5) 

i=0 


ap,q{X)  *| {Xi,p'Xi,q)  '  Xi,p)  ’  Xi,q ) 

2  =  0 

SJX) 


22—1 


2  =  0 


1 

n 


22—1 


22—1 


2  =  0 


(D.6) 


which  does  not  require  the  explicit  calculation  of  pp  and  fiq.  In  the 
special  case  of  p  =  g,  we  get 


1 

n 

1 

n 


22—1 

^2(Xi,p 
2  =  0 
22—1 

_2  =  0 


ftp) 


1 


22—1 


n '  (5Z 


2  =  0 


(D.7) 

(D.8) 


which  is  the  variance  within  the  component  p.  This  corresponds  to 
the  ordinary  (one-dimensional)  variance  <Jp(X)  of  the  n  scalar  sample 
values  XQ  p ,  xlpi . . . ,  xn_i  p  (see  also  Sec.  3.7.1). 


D.1.3  Biased  vs.  Unbiased  Variance 


If  the  variance  (or  covariance)  of  some  population  is  estimated  from  a 
small  set  of  random  samples,  the  results  obtained  by  the  formulation 
given  in  the  previous  section  are  known  to  be  statistically  biased.1 
The  most  common  form  of  correcting  for  this  bias  is  to  use  the  factor 
l/(n  —  1)  instead  of  1  /n  in  the  variance  calculations.  For  example, 
Eqn.  (D.5)  would  change  to 


1 


n 


1 


22—1 


^  ,p  ftp)  ’ 


q 


2=0 


(D.9) 


to  yield  an  unbiased  sample  variance.  In  the  following  (and  through¬ 
out  the  text),  we  ignore  the  bias  issue  and  consistently  use  the  factor 
1/n  for  all  variance  calculations.  Note,  however,  that  many  software 
packages2  use  the  bias-corrected  factor  l/(n  —  1)  by  default  and  thus 
may  return  different  results  (which  can  be  easily  scaled  for  compari¬ 
son). 


D.2  The  Covariance  Matrix 

The  covariance  matrix  X  for  the  m-dimensional  sample  X  is  a  square 
matrix  of  size  m  x  m  that  is  composed  of  the  covariance  values  ap  q 
for  all  pairs  (p,  q )  of  components,  that  is, 

1  Note  that  the  estimation  of  the  mean  by  the  sample  mean  (Eqn.  (D.3)) 
is  not  affected  by  this  bias  problem. 

2  For  example,  Apache  Commons  Math ,  Matlab ,  Mathematica. 


D.2  The  Covariance 
Matrix 


E(X) 


^0,0 

00,1 

^0,777—1 

al,0 

^1,1  • 

• 

^1,771—1 

Gn— 1,0 

Gn— 1,1 

l,77i— 1 

°0 

ao,i 

A) ,  777  —  1 

o 

e5-  " 

A 

^1,777—1 

Gn  —  1,0 

Gn— 1,1 

• 

(J2  i 
^777—1 

J 


(D.10) 


(D.ll) 


Note  that  any  diagonal  element  of  U(X)  is  the  ordinary  (scalar)  vari¬ 
ance  CTp(X)  (see  Eqn.  (D.T)),  for  p  =  0, . . . ,  m  —  1,  which  can  never 
be  negative.  All  other  entries  of  a  covariance  matrix  may  be  posi¬ 
tive  or  negative  in  general.  Since  apq  =  crq  p,  a  covariance  matrix  is 
always  symmetric,  with  up  to  (m2  +  m)/2  unique  elements.  Thus, 
any  covariance  matrix  has  the  important  property  of  being  positive 
semidefinite ,  which  implies  that  all  its  eigenvalues  (see  Sec.  B.4)  are 
positive  (i.e.,  non- negative).  The  covariance  matrix  can  also  be  writ¬ 
ten  in  the  form 


1 


n— 1 


E{x)  =  -•£>*-  MV]  •  [**  -  Mvr, 

n  i= o V  r  77"7^7  777  / 


[xi-fi(X)]  0  [xz-fi(X)\ 


(D.12) 


where  G  denotes  the  outer  (vector)  product. 

The  trace  (sum  of  the  diagonal  elements)  of  the  covariance  matrix, 


^totai(X)  =  trace  (r(X)) ,  (D.13) 


is  called  the  total  variance  of  the  multivariate  sample.  Alternatively, 
the  (Frobenius)  norm  of  the  covariance  matrix  X(X),  defined  as 

rri  —  1  rri  —  1 

II^POII  2  =  (EW)1/2'  (d-14) 

7=0  .7=0 

can  be  used  to  quantify  the  overall  variance  in  the  sample  data. 


D.2.1  Example 


Assume  that  the  sample  X  consists  of  the  following  set  of  four  3D 
vectors  (i.e.,  m  =  3  and  n  =  4) 


with  each  xi  =  [xi  R,  xiG^xiB)1  representing  a  particular  RGB  color. 
The  resulting  sample  mean  vector  (see  Eqn.  (D.3))  is 


f/aR\  i  /75  +  41  +  93  +  12\  x  /  22l\ 
»(*)=  Ug  =-•  37  +  27  +  81  +  48  =  -•  193 

\iaBJ  \12-}-20  +  ll  +  52 /  \  95  / 


/55.25\ 
48.25  , 

\23.75 J 


and  the  associated  covariance  matrix  (Eqn.  (D.ll))  is 
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972.188  331.938  -470.438  \ 

331.938  412.688  -53.188  . 

-470.438  -53.188  278.188  ) 

As  predicted,  this  matrix  is  symmetric  and  all  diagonal  elements  are 
non-negative.  Note  that  no  sample  bias-correction  (see  Sec.  D.1.3) 
was  used  in  this  example.  The  total  variance  (Eqn.  (D.13))  of  the 
sample  set  is 

crtotal(X)  =  trace  (47(A))  =  972.188+412.688+278.188  «  1663.06, 

and  the  Froebenius  norm  of  the  covariance  matrix  (see  Eqn.  (D.14)) 
is  ||A(X)||2  «  1364.36. 

D.2.2  Practical  Calculation 

The  calculation  of  covariance  matrices  is  implemented  in  almost  any 
software  package  for  statistical  analysis  or  linear  algebra.  For  exam¬ 
ple,  with  the  Apache  Commons  Math  library  this  could  be  accom¬ 
plished  as  follows: 

import  org .  apache  .  commons  .math3 .  stat .  correlation .  Covariance  ; 

•  •  • 

double  []  []  X;  //  X  [i]  is  the  i-th  sample  vector 

Covariance  cov  =  new  Covariance  (X,  false);  //  no  bias  correction 

RealMatrix  S  =  cov . getCovarianceMatrix () ; 
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D.3  Mahalanobis  Distance 

The  Mahalanobis  distance3  [157]  is  used  to  measure  distances  in 
multi-dimensional  distributions.  Unlike  the  Euclidean  distance  it 
takes  into  account  the  amount  of  scatter  in  the  distribution  and  the 
correlation  between  features.  In  particular,  the  Mahalanobis  distance 
can  be  used  to  measure  distances  in  distributions,  where  the  indi¬ 
vidual  components  substantially  differ  in  scale.  Depending  on  their 
scale,  a  few  components  (or  even  a  single  component)  may  domi¬ 
nate  the  ordinary  (Euclidean)  distance  outcome  and  the  “smaller” 
components  have  no  influence  whatsoever. 


D.3.1  Definition 


Given  a  distribution  of  m-dimensional  samples  X  =  (cc0, . . . ,  ccn_1), 
with  xk  G  Mm,  the  Mahalanobis  distance  between  two  samples  xa, 
xb  is  defined  as 


djvi (**+?  xb) 


xb 


M 


*fc)T  '  S_1  '  Oa 


(D.15) 


where  X  is  the  m  x  m  covariance  matrix  of  the  distribution  X,  as 
described  in  Sec.  D.2.4 

3  http://en.wikipedia.org/wiki/Mahalanobis_distance. 

4  Note  that  the  expression  under  the  root  in  Eqn.  (D.15)  is  the  (dot) 
product  of  a  row  vector  and  a  column  vector,  that  is,  the  result  is  a 
non- negative  scalar  value. 


The  Mahalanobis  distance  normalizes  each  feature  component  to 
zero  mean  and  unit  variance.  This  makes  the  distance  calculation 
independent  of  the  scale  of  the  individual  components,  that  is,  all 
components  are  “treated  fairly”  even  if  their  range  is  many  orders 
of  magnitude  different.  In  other  words,  no  component  can  dominate 
the  others  even  if  its  magnitude  is  disproportionally  large. 


D.3.2  Relation  to  the  Euclidean  Distance 


Recall  that  the  Euclidean  distance  between  two  points  xa,xb  in  Mm 
is  equivalent  to  the  (L2)  norm  of  the  difference  vector  xa  —  xbl  which 
can  be  written  in  the  form 


~Xb)J  '  (xa 


(D.16) 


Note  the  structural  similarity  with  the  definition  of  the  Mahalanobis 
distance  in  Eqn.  (D.15),  the  only  difference  being  the  missing  matrix 
E-1.  This  becomes  even  clearer  if  we  analogously  insert  the  identity 
matrix  I  into  Eqn.  (D.16),  that  is, 


dE(*a,*6)  = 


Xa  X\ 


=  V(Xa  -  XbV  •  I  -  Oa  -  xb)  »  (D.17) 


which  obviously  does  not  change  the  outcome.  The  purpose  of  E-1 
in  Eqn.  (D.15)  is  to  map  the  difference  vectors  (and  thus  the  involved 
vectors  xalxb)  into  a  transformed  (scaled  and  rotated)  space,  where 
the  actual  distance  measurement  is  performed.  In  contrast,  with  the 
Euclidean  distance,  all  components  contribute  equally  to  the  distance 
measure,  without  any  scaling  or  other  transformation. 


D. 3.3  Numerical  Aspects 

For  calculating  the  Mahalobis  distance  (Eqn.  (D.15))  the  inverse  of 
the  covariance  matrix  (Sec.  D.2)  is  needed.  By  definition,  a  covari¬ 
ance  matrix  E  is  symmetric  and  its  diagonal  values  are  non-negative. 
Similarly  (at  least  in  theory),  its  inverse  E-1  should  also  be  symmet¬ 
ric  with  non-negative  diagonal  values.  This  is  necessary  to  ensure 
that  the  quantities  under  the  square  root  in  Eqn.  (D.15)  are  always 
positive. 

Unfortunately,  E  is  often  ill-conditioned  because  of  diagonal  val¬ 
ues  that  are  very  small  or  even  zero.  In  this  case,  E  is  not  positive- 
definite  (as  it  should  be),  that  is,  one  or  more  of  its  eigenvalues  are 
negative,  the  inversion  becomes  numerically  unstable  and  the  result¬ 
ing  E-1  is  non-symmetric.  A  simple  remedy  to  this  problem  is  to 
add  a  small  quantity  to  the  diagonal  of  the  original  covariance  matrix 

E,  that  is, 


E  =  E  +  e-I,  (D.18) 

to  enforce  positive  definiteness,  and  to  use  E-1  in  Eqn.  (D.15). 

A  possible  alternative  is  to  calculate  the  Eigen  decomposition 5  of 
E  in  the  form 

5  See  http://mathworld.wolfram.com/EigenDecomposition.html  and  the  class 
EigenDecomposition  in  the  Apache  Commons  Math  library. 
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E  =  V  •  A  •  VT 

where  A  is  a  diagonal  matrix  containing  the  eigenvalues  of  E  (which 
may  be  zero  or  negative).  From  this  we  create  a  modified  diagonal 
matrix  A  by  substituting  all  non-positive  eigenvalues  with  a  small 
positive  quantity  e,  that  is, 


Am  =  min(AM,e).  (D.20) 

(typically  e  ss  10  6)  and  finally  calculate  the  modified  covariance 
matrix  as 


E  =  V-A-Vt,  (D.21) 

which  should  be  positive  definite.  The  (symmetric)  inverse  E-1  is 
then  used  in  Eqn.  (D.15). 


D.3.4  Pre-Mapping  Data  for  Efficient  Mahalanobis 
Matching 


Assume  that  we  have  a  large  set  of  sample  vectors  (“data  base”) 
X  =  (x0, . . . ,  xn_1)  which  shall  be  frequently  queried  for  the  instance 
most  similar  (i.e.,  closest)  to  a  given  search  sample  xs.  Assuming 
that  the  search  through  X  is  performed  linearly,  we  would  need  to 
calculate  dM(xsl  aq) — using  Eqn.  (D.15) — for  all  elements  of  xi  in  X. 

One  way  to  accelerate  the  matching  is  to  perform  the  transforma¬ 
tion  defined  by  E-1  to  the  entire  data  set  only  once,  such  that  the 
Euclidean  norm  alone  can  be  used  for  the  distance  calculation.  For 
the  sake  of  simplicity  we  write 


d]Vl 


2 

M 


y 


2 

M 


(D.22) 


with  the  difference  vector  y  =  xa—xb,  such  that  Eqn.  (D.15)  becomes 


y 


2  T 

m  =  y 


1-1 


y 


(D.23) 


The  goal  is  to  find  a  transformation  U  such  that  we  can  calculate 
the  Mahalanobis  distance  from  the  transformed  vectors  directly  as 


y  =  u  •  y, 


(D.24) 


by  using  the  ordinary  Euclidean  norm 
form 


2  instead,  that  is,  in  the 


y 


2 

M 


=  2/  o  =  y  y 


=  (U  •  y)T  •  (U  •  y)  =  (y1  ■  UT)  •  (U  •  y) 
=  y1  ■  UT  XJ  ■  y  =  yT  ■  S-1  •  y  . 


(D.25) 

(D.26) 

(D.27) 


While  we  do  not  know  the  matrix  U  yet,  we  see  from  Eqn.  (D.27) 
that  it  must  satisfy 


UT-U  =  S_1. 


(D.28) 
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Fortunately,  since  S  1  is  symmetric  and  positive  definite,  such  a 
decomposition  of  S_1  always  exists. 


The  standard  method  for  calculating  U  in  Eqn.  (D.28)  is  by  the 
Cholesky  decomposition,6  which  can  factorize  any  symmetric,  posi¬ 
tive  definite  matrix  A  in  the  form 


A  =  L  Lt  or  A  =  UT  •  U,  (D.29) 

where  L  is  a  lower-triangular  matrix  or,  alternatively,  U  is  an  upper- 
triangular  matrix  (the  second  variant  is  the  one  we  need).7  Since 
the  transformation  of  the  difference  vectors  y  -A  U  •  y  is  a  linear 
operation,  the  result  is  the  same  if  we  apply  the  transformation  in¬ 
dividually  to  the  original  vectors,  that  is, 


y  =  U  •  y  =  U  •  (xa  -  xh)  =  XJ-xa  -  XJ-xb .  (D.30) 


This  means  that,  given  the  transformation  U,  we  can  obtain  the 
Mahalanobis  distance  between  two  points  xa,xh  (as  defined  in  Eqn. 
(D.15))  by  simply  calculating  the  Euclidean  distance  in  the  form 


d]Vl  {XCL7  xb) 


U  .(xa 


u -xa 


U-xb 


2  ‘ 


(D.31) 


In  summary,  this  suggests  the  following  solution  to  a  large-database 
Mahalanobis  matching  problem: 


1.  Calculate  the  covariance  matrix  XI  for  the  original  dataset  X  = 
(cCq  , . . . ,  xn_ i ) . 

2.  Condition  XI,  such  that  it  is  positive  definite  (see  Sec.  D.3.3). 

3.  Find  the  matrix  U,  such  that  UT  •  U  =  XI-1  (by  Cholesky  de¬ 
composition  of  XI-1). 

4.  Transform  all  samples  of  the  original  data  set  X  =  (cc0, . . . ,  xn_1) 
to  X  =  (xq , . . . ,  xn_1),  with  xk  =  U  •  xk.  This  now  becomes  the 
actual  “database”. 

5.  Apply  the  same  transformation  to  the  search  sample  xs,  that  is, 
calculate  xs  =  U  •  xs. 

6.  Find  the  index  l  of  the  best-matching  element  in  X  (in  terms  of 
the  Mahalanobis  distance)  by  calculating  the  Euclidean  (!)  dis¬ 
tance  between  the  transformed  vectors,  that  is 


l  =  argmin 

0<k<n 


Xs 


(D.32) 


Since  the  matching  is  now  performed  with  the  ordinary  Euclidean 
distance  and  the  Mahalanobis  calculation  is  not  required  during  the 
search,  the  savings  should  be  substantial.  Also,  this  opens  an  easy 
path  to  the  use  of  advanced,  tree-based  matching  techniques,  such  as 
the  common  /c-nearest  neighbor  methods. 

6  See  http://mathworld.wolfram.com/CholeskyDecomposition.html. 

7  The  Cholesky  decomposition  (CD)  requires  that  the  supplied  matrix 
A  is  symmetric  and  positive  definite,  otherwise  the  decomposition  will 
fail.  In  fact,  the  CD  itself  is  commonly  used  to  test  if  a  given  matrix  is 
positive  definite.  It  is  implemented  by  class  CholeskyDecomposition  of 
the  Apache  Commons  Math  library. 
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D.4  The  Gaussian  Distribution 


The  Gaussian  distribution  plays  a  major  role  in  decision  theory,  pat¬ 
tern  recognition,  and  statistics  in  general,  because  of  its  convenient 
analytical  properties.  A  continuous,  scalar  quantity  X  is  said  to  be 
subject  to  a  Gaussian  distribution,  if  the  probability  of  observing  a 
particular  value  x  is 


p(X  =  x)  =  p(x) 


1  _  P-aQ2 

,  •  e  2-^2 

V2tT(72 


(D.33) 


The  Gaussian  distribution  is  completely  defined  by  its  mean  p  and 
variance  a2.  The  Gaussian  distribution,  also  called  a  “normal”  dis¬ 
tribution,  is  commonly  denoted  in  the  form 


p(x)  ^  J\[(X  |  /i,  cr2)  or  X  ~  J\f(p,  cr2),  (D.34) 


saying  that  “X  is  normally  distributed  with  parameters  p  and  cr2  .” 
As  required  for  any  valid  probability  distribution, 


Af(X  |  /x,  cr2)  >  0  and 


•OO 


J\f(X  |  /i,  cr2)  dx  =  1.  (D.35) 


—  oo 


Thus  the  area  under  the  probability  distribution  curve  is  always  one, 
that  is,  A f()  is  normalized.  The  Gaussian  function  in  Eqn.  (D.33) 
has  its  maximum  height  (called  “mode”)  at  position  x  =  /i,  where  its 
value  is 


p(x  =  p) 


1 

\JClrK02 


(D.36) 


If  a  random  variable  X  is  normally  distributed  with  mean  p  and 
variance  cr2,  then  the  result  of  a  linear  mapping  of  the  kind  X'  = 
aX  -T  b  is  again  a  random  variable  that  is  normally  distributed,  with 
parameters  p  =  a-p  +  b  and  a2  =  a2  -cr2: 


X  ^  J\[(p,  cr2)  =>  a-X b  ~  Af(a-p-\-b,a2 -cr2),  (D.37) 


for  a,  b  G  R. 

Moreover,  if  X1,X2  are  statistically  independent ,  normally  dis¬ 
tributed  random  variables  with  means  Pi,  p2  and  variances  a2,  a2, 
respectively,  then  a  linear  combination  of  the  form  ctiX1  +  a2X2  is 
again  normally  distributed  with  p12  =  a1-p1  +a2-p2  and  cr12  =  a2-cr2 
+  a2-cr2,  that  is, 

(cl^X^  T  a2X2 )  ~  J\f  (cl ’ p\  +  cl2  •  p2 ->  cl i m <J]_  T  ct2  •  cr2 ) .  (D.38) 


D.4.1  Maximum  Likelihood  Estimation 

The  probability  density  function  p(x)  of  a  statistical  distribution  tells 
us  how  probable  it  is  to  observe  the  result  x  for  some  fixed  distribu¬ 
tion  parameters,  such  as  p  and  cr,  in  case  of  a  normal  distribution. 
If  these  parameters  are  unknown  and  need  to  be  estimated,8  it  is 
interesting  to  ask  the  reverse  question: 

8  As  required,  for  example,  for  “minimum  error  thresholding”  in  Chapter 
11,  Sec.  11.1.6. 


756 


How  likely  are  particular  parameter  values  for  a  given  set  of 
empirical  observations  (assuming  a  certain  type  of  distribu¬ 
tion)? 

This  is  (in  a  casual  sense)  what  the  term  “likelihood”  stands  for.  In 
particular,  a  distribution’s  likelihood  function  quantifies  the  proba¬ 
bility  that  a  given  (fixed)  set  of  observations  was  generated  by  some 
varying  distribution  parameters. 

Note  that  the  probability  of  observing  the  outcome  x  from  the 
normal  distribution, 


p(x)  =  p(x  |  /i,  cr2), 


(D.39) 


is  really  a  conditional  probability,  stating  how  probable  it  is  to  ob¬ 
serve  the  value  x  from  a  given  normal  distribution  with  known  pa¬ 
rameters  (i  and  cr2.  Conversely,  a  likelihood  function  for  the  normal 
distribution  could  be  viewed  as  a  conditional  function 


L(p,a2 


(D.40) 


which  quantifies  the  likelihood  of  (/i,cr2)  being  the  correct  distribu¬ 
tion  parameters  for  a  given  observation  x.  The  maximum  likelihood 
method  tries  to  find  optimal  parameters  by  maximizing  the  value  of 
a  distribution’s  likelihood  function  L. 

If  we  draw  two  independent9  samples  xa,  xb  that  are  subjected  to 
the  same  distribution,  their  joint  probability  (i.e. ,  the  probability  of 
xa  and  xb  occurring  together  in  the  sample)  is  the  product  of  their 
individual  probabilities,  that  is, 


p(xa  A  xb)  =  p(xa)  ■  p(xb) .  (D.41) 

In  general,  if  we  are  given  a  vector  of  m  independent  observations 
X  =  (x1,x2, . . . ,  xm)  from  the  same  distribution,  the  probability  of 
observing  exactly  this  set  of  values  is 


p(X)  =  p(x0  A  x1  A  ...  A  xm_i) 


=  p(x o)  •  p(x i)  •  . . .  •  p(xm_i)  =  p{xi) .  (D.42) 

2  =  0 

Thus,  if  the  sample  X  originates  from  a  normal  distribution  A f,  a 
suitable  likelihood  function  is 


L(/i,cr2|X)  =  p(X  I  /i,  cr2) 

rn  —  1 


rn  —  1 


II  J\f(xz\p,a2)  = 


1 


2-CT2 


2  =  0 


2  =  0 


v27 rj2 


(D.43) 

(D.44) 


The  parameters  (/i,<j2),  for  which  L(/i,cr2  \X)  is  a  maximum,  are 
called  the  maximum-likelihood  estimate  for  X. 

Note  that  it  is  not  necessary  for  a  likelihood  function  to  be  a 
proper  (i.e.,  normalized)  probability  distribution,  since  it  is  only  nec¬ 
essary  to  calculate  whether  a  particular  set  of  distribution  parameters 

9  Although  this  assumption  is  often  violated,  independence  is  important 
to  keep  statistical  problems  simple  and  tractable.  In  particular,  the 
values  of  adjacent  image  pixels  are  usually  not  independent. 


D.4  The  Gaussian 
Distribution 
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D.4.2  Gaussian  Mixtures 

In  practice,  probabilistic  models  are  often  too  complex  to  be  de¬ 
scribed  by  a  single  Gaussian  (or  other  standard)  distribution.  With¬ 
out  losing  the  mathematical  convenience  of  Gaussian  models,  highly 
complex  distributions  can  be  modeled  as  combinations  of  multiple 
Gaussian  distributions  with  different  parameters.  Such  a  Gaussian 
mixture  model  is  a  linear  superposition  of  K  Gaussian  distributions 
of  the  form 


is  more  probable  than  another.  Thus  the  likelihood  function  L  may 
be  any  monotonic  function  of  the  corresponding  probability  p  in  Eqn. 
(D.43),  in  particular  its  logarithm ,  which  is  commonly  used  to  avoid 
multiplying  small  values. 


K- 1 

p(x)  =  F 

j= 0 


J\f(x  | 


(D.45) 


where  the  weights  (“mixing  coefficients”)  7Tj  express  the  probability 
that  an  event  x  was  generated  by  the  jth  component  (with  tt j  = 

l).10  The  interpretation  of  this  mixture  model  is,  that  there  are  K 
independent  Gaussian  “components”  (each  with  its  parameters  /i^, 
(jj)  that  contribute  to  a  common  stream  of  events  xi.  If  a  particular 
value  x  is  observed,  it  is  assumed  to  be  the  result  of  exactly  one  of 
the  K  components,  but  the  identity  of  that  component  is  unknown. 

Assume,  as  a  special  case,  that  a  probability  distribution  p(x)  is 
the  superposition  (mixture)  of  two  Gaussian  distributions,  that  is, 


p(x)  =  Ka-N(x\na,(T2a )  +  -Kb-N(x\ph,(rl). 


(D.46) 


Any  observed  value  x  is  assumed  to  be  generated  by  either  the  first 
component  (with  /ia,  a2  and  prior  probability  7ra)  or  the  second  com¬ 
ponent  (with  /i5,  c2  and  prior  probability  nb).  These  parameters  as 
well  as  the  prior  probabilities  are  unknown  but  can  be  estimated  by 
maximimizing  the  likelihood  function  L.  Note  that,  in  general,  the 
unknown  parameters  cannot  be  calculated  in  closed  form  but  only 
with  numerical  methods.  For  further  details  and  solution  techniques 
see  [24,64,228],  for  example. 


D.4.3  Creating  Gaussian  Noise 

Synthetic  Gaussian  noise  is  often  used  for  testing  in  image  process¬ 
ing,  particularly  for  assessing  the  quality  of  smoothing  filters.  While 
the  generation  of  pseudo-random  values  that  follow  a  Gaussian  dis¬ 
tribution  is  not  a  trivial  task  in  general,11  it  is  readily  implemented 
in  Java  by  the  standard  class  Random.  For  example,  the  Java  method 
addGaussianNoise ()  in  Prog.  D.l  adds  Gaussian  noise  with  zero 
mean  (/i  =  0)  and  standard  deviation  sigma  (cr)  to  a  grayscale  image 

I  of  type  FloatProcessor  (Image J).  The  random  values  produced 

10  The  weight  i q-  is  also  called  the  prior  probability  of  the  component  j. 

II  Typically  the  so-called  polar  method  is  used  for  generating  Gaussian 
random  values  [138,  Sec.  3.4.1]. 
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by  successive  calls  to  the  method  nextGaussianO  in  line  10  follow  a  j)  4  ^pHE  Gaussian 
Gaussian  distribution  A/”(0, 1),  with  mean  /1  =  0  and  variance  a2  =  1.  Distribution 
As  implied  by  Eqn.  (D.37), 

X  ~  Af(0. 1)  =>  a  +  s-X  ~AT(a,s2),  (D.47) 

and  thus  scaling  the  results  from  nextGaussianO  by  s  and  additive 
shifting  by  a  makes  the  resulting  random  variable  noise  normally 
distributed  with  A 7(a,  s2). 


1 

9 

import  java. util . Random; 

Z 

3 

void  addGaussianNoise  (FloatProcessor  I,  double 

sigma)  { 

4 

int  w  =  I  .getWidthO  ; 

5 

int  h  =  I . getHeight () ; 

6 

Random  rnd  =  new  Random () ; 

7 

for  (int  v  =  0;  v  <  h;  v++)  { 

8 

for  (int  u  =  0;  u  <  w;  u++)  { 

9 

float  val  =  I.getf(u,  v) ; 

10 

float  noise  =  (float)  (rnd. nextGaussianO 

*  sigma) ; 

11 

I.setf(u,  v,  val  +  noise); 

12 

} 

13 

} 

14 

} 

Prog.  D.l 

Java  method  for  adding  Gaus¬ 
sian  noise  to  an  image  of  type 
Float Processor. 
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Appendix  E 


Gaussian  Filters 


This  part  supplements  the  material  presented  in  Ch.  25  (SIFT). 

E.1  Cascading  Gaussian  F  Iters 

To  compute  a  Gaussian  scale  space  efficiently  (as  used  in  the  SIFT 
method,  for  example),  the  scale  layers  are  usually  not  obtained  di¬ 
rectly  from  the  input  image  by  smoothing  with  Gaussians  of  increas¬ 
ing  size.  Instead,  each  layer  can  be  calculated  recursively  from  the 
previous  layer  by  filtering  with  relatively  small  Gaussians.  Thus,  the 
entire  scale  space  is  implemented  as  a  concatenation  or  “cascade”  of 
smaller  Gaussian  filters.1 

If  Gaussian  filters  of  sizes  cr^cr 2  are  applied  successively  to  the 
same  image,  the  resulting  smoothing  effect  is  identical  to  using  a 
single  larger  Gaussian  filter  that  is, 

(/  *  Hg)  *Hg=I*  (Hg  *  Hg)  =I*H y  (E.l) 

with  a  =  \J o\  +  erf  being  the  size  of  the  resulting  combined  Gaussian 
filter  Hp  [129,  Sec.  4.5.4].  Put  in  other  words,  the  variances  (squares 
of  the  a  values)  of  successive  Gaussian  filters  add  up,  that  is, 

a2  =  o\  +  o\ .  (E.2) 

In  the  special  case  of  the  same  Gaussian  filter  being  applied  twice 
(a1  =  a2),  the  effective  width  of  the  combined  filter  is  a  =  y/2  •  cr1. 

E.2  Gaussian  Filters  and  Scale  Space 

In  a  Gaussian  scale  space,  the  scale  corresponding  to  each  level  is 
proportional  to  the  width  (cr)  of  the  Gaussian  filter  required  to  derive 
this  level  from  the  original  (completely  unsmoothed)  image.  Given 
an  image  that  is  already  pre-smoothed  by  a  Gaussian  filter  of  width 

1  See  Chapter  25,  Sec.  25.1.1  for  details. 
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<j1  and  should  be  smoothed  to  some  target  scale  cr2  >  cr1 ,  the  required 
width  of  the  additional  Gaussian  filter  is 


Usually  the  neighboring  layers  of  the  scale  space  differ  by  a  constant 
scale  factor  (n)  and  the  transformation  from  one  scale  level  to  an¬ 
other  can  be  accomplished  by  successively  applying  Gaussian  filters. 
Despite  the  constant  scale  factor,  however,  the  width  of  the  required 
filters  is  not  constant  but  depends  on  the  image’s  initial  scale.  In  par¬ 
ticular,  if  we  want  to  transform  an  image  with  scale  cr0  by  a  factor  n 
to  a  new  scale  n  •  cr0,  then  (from  Eqn.  (E.2))  for  crd  the  relation 

(«  •  °o)2  =  00  +crj  (E.4) 

must  hold.  Thus,  the  width  crd  of  the  required  Gaussian  smoothing 
filter  is 


ad  =  a0-  \At2  -  1.  (E.5) 

For  example,  doubling  the  scale  ( k  =  2)  of  an  image  that  is  pre¬ 
smoothed  with  (Jq  requires  a  Gaussian  filter  of  width  ad  =  a0  ■  (22  — 

l)1/2  =  <j0  .  ^/3  ~  <j0  .  1.732. 


E.3  Effects  of  Gaussian  Filtering  in  the  Frequency 
Domain 


For  the  ID  Gaussian  function 


9  a  U) 


1 


X 


2cr< 


a\/27T 


(E.6) 


the  continuous  Fourier  transform2  Jr( ga )  is 


G.M  = 


1 


2  2 
oj  a 


V2 


7 r 


(E.7) 


Doubling  the  width  (a)  of  a  Gaussian  hlter  corresponds  to  cutting  the 
bandwidth  by  half.  If  a  is  doubled,  the  Fourier  transform  becomes 


cr(^)  — 


1 


uj2(2  a)2 


2  _ 


1 


4 co2  cr2 


V27T 

1 


(2uj)2a2 


V2 


7 r 


V27T 

Ga(2x) 


(E.8) 


(E.9) 


and,  in  general,  when  scaling  the  filter  by  a  factor  /c, 

Gkcr(u)  =  Ga(kx). 


(E.10) 


That  is,  if  a  is  increased  (or  the  kernel  widened)  by  a  factor  /c,  the 
corresponding  Fourier  transform  gets  contracted  by  the  same  factor. 
In  terms  of  linear  filtering  this  means  that  widening  the  kernel  by 
some  factor  k  decimates  the  resulting  signal  bandwidth  by 


See  also  Chapter  18,  Sec.  18.1. 
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2 


E.4  LoG-Approximation  by  the  DoG 


E.4  LoG-Approximation 
BY  THE  DoG 


The  2D  LoG  kernel  (see  Ch.  25,  Sec.  25.1.1), 


2,2 
x  +  y 


Lcr(x,y)  =  (y2ga)  (x,y)  =  Ajp  +^2  —  )-e 


7T<7 


(E.ll) 


has  a  (negative)  peak  at  the  origin  with  the  associated  function  value 


Lct(0,0)  = - G.  (E.12) 

7TCT^ 

Thus,  the  scale  normalized  LoG  kernel,  defined  in  Eqn.  (25.10)  as 

L<j{x,y)  =  a2- La(x,y),  (E.13) 

has  the  peak  value 

■M  0,0)  = - L  (E.14) 

7 TCTZ 

at  the  origin.  In  comparison,  for  a  given  scale  factor  ft,  the  unsealed 
DoG  function 


DoG^ft.y) 


GKa(x,y) 

1 

27 Tft2cr2 


Ga{x,y ) 

x2  +  y 2 

2k,2  cr^ 


1 

27 rcr2 


2  ,  2 
x  +y 


has  a  peak  value 


DoGa  jK(0, 0) 


ft2  —  1 

27 Tft2cr2 


(E.15) 


(E.16) 


By  scaling  the  DoG  function  by  some  factor  A  to  match  the  LoG’s 
center  peak  value,  such  that  Lcr( 0,  0)  =  A  •  DoG^  *,(0,  0),  the  original 
LoG  (Eqn.  (E.ll))  is  approximated  by  the  DoG  in  the  form 


La (x,y)  S3 


DoG<r,K(a;,  2/)- 


(E.17) 


Similarly,  the  scale- normalized  LoG  (Eqn.  (E.13))  is  approximated 
by  the  DoG  as3 


^  2ft2 

La(x,y)~— — 7-DoG  (x,y).  (E.18) 

ftZ  —  1 

Since  the  factor  in  Eqn.  (E.18)  depends  on  ft  only,  the  DoG  approx¬ 
imation  is  (for  a  constant  size  ratio  ft)  implicitly  proportional  to  the 
scale  normalized  LoG  for  any  scale  a. 


3  A  different  formulation,  La(x,y)  ~  •  DoGCTjK(x,  y),  is  given  in  [153], 

which  is  the  same  as  Eqn.  (E.18)  for  ft  — >  1,  but  not  for  ft  >  1.  The 
essence  is  that  the  leading  factor  is  constant  and  independent  of  <r, 
and  can  thus  be  ignored  when  comparing  the  magnitude  of  the  filter 
responses  at  varying  scales. 
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Appendix  F 


Java  Notes 


As  a  text  for  undergraduate  engineering  curricula,  this  book  assumes 
basic  programming  skills  in  a  procedural  language,  such  as  Java,  C#, 
or  C.  The  examples  in  the  main  text  should  be  easy  to  understand 
with  the  help  of  an  introductory  book  on  Java  or  one  of  the  many 
online  tutorials.  Experience  shows,  however,  that  difficulties  with 
some  basic  Java  concepts  pertain  and  often  cause  complications,  even 
at  higher  levels.  The  following  sections  address  some  of  these  typical 
problem  spots. 


F.1  Arithmetic 

Java  is  a  “strongly  typed”  programming  language,  which  means  in 
particular  that  any  variable  has  a  fixed  type  that  cannot  be  altered 
dynamically.  Also,  the  result  of  an  expression  is  determined  by  the 
types  of  the  involved  operands  and  not  (in  the  case  of  an  assignment) 
by  the  type  of  the  “receiving”  variable. 

F.1.1  Integer  Division 

Division  involving  integer  operands  is  a  frequent  cause  of  errors.  If 
the  variables  a  and  b  are  both  of  type  int,  then  the  expression  a/b 
is  evaluated  according  to  the  rules  of  integer  division.  The  result— 
the  number  of  times  b  is  contained  in  a — is  again  of  type  int.  For 
example,  after  the  Java  statements 

int  a  =  2; 
int  b  =  5; 

double  c  =  a  /  b;  //  resulting  value  of  c  is  zero! 

the  value  of  c  is  not  0.4  but  0.0  because  the  expression  a/b  on  the 
right  yields  the  int- value  0,  which  is  then  automatically  converted 
to  the  double  value  0.0. 

If  we  wanted  to  evaluate  a/b  as  a  floating-point  operation  (as 
most  pocket  calculators  do),  at  least  one  of  the  involved  operands 
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must  be  converted  to  a  floating-point  value,  such  as  by  an  explicit 
type  cast,  for  example, 

double  c  =  (double)  a  /  b;  //  value  of  c  is  0.4 

or  alternatively 

double  c  =  a  /  (double)  b;  //  value  of  c  is  0.4 


Example 


Assume,  for  example,  that  we  want  to  scale  any  pixel  value  a  of  an 
image  such  that  the  maximum  pixel  value  amax  is  mapped  to  255  (see 
Ch.  4).  In  mathematical  notation,  the  scaling  of  the  pixel  values  is 
simply  expressed  as 


c  <— 


•255 


^max 


and  it  may  be  tempting  to  convert  this  1:1  into  Java  code,  such  as 


int  a_max  =  ip . getMaxValue () ; 
for  ...  { 

int  a  =  ip . getPixel (u, v) ; 

int  c  =  (a  /  a_max)  *  255;  //<(— problem! 

ip .putPixel (u,  v,  c) ; 


As  we  can  easily  predict,  the  resulting  image  will  be  all  black  (zero 
values),  except  those  pixels  whose  value  was  a_max  originally  (they 
are  set  to  255).  The  reason  is  again  that  the  division  a/a_max  has 
two  operands  of  type  int,  and  the  result  is  thus  zero  whenever  the 
denumerator  (a_max)  is  greater  than  the  numerator  (a). 

Of  course,  the  entire  operation  could  be  performed  in  the  floating¬ 
point  domain  by  converting  one  of  the  operands  (as  we  have  shown), 
but  this  is  not  even  necessary  in  this  case.  Instead,  we  may  simply 
swap  the  order  of  operations  and  start  with  the  multiplication: 

int  c  =  a  *  255  /  a_max; 

Why  does  this  work  now?  The  subexpression  a  *  255  is  evaluated 
first,1  generating  large  intermediate  values  that  pose  no  problem  for 
the  subsequent  (integer)  division.  Nevertheless,  rounding  should  al¬ 
ways  be  considered  to  obtain  more  accurate  results  when  computing 
fractions  of  integers  (see  Sec.  F.1.5). 

F.1.2  Modulus  Operator 

The  result  of  the  modulus  operator  a  mod  6  (used  in  several  places  in 
the  main  text)  is  defined  [92,  p.  82]  as  the  remainder  of  the  “floored” 
division  a/6, 


a  mod  b 


a 

a  —  b  •  [a/b\ 


for  6  =  0, 
otherwise, 


(F.l) 
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1  In  Java,  expressions  at  the  same  level  are  always  evaluated  in  left-to- 
right  order,  and  therefore  no  parentheses  are  required  in  this  example 
(though  they  would  do  no  harm  either). 


for  a,  b  G  R.  This  type  of  operator  or  library  method  was  not  available  p  ^  Arithmetic 
in  the  standard  Java  API  until  recently.2  The  following  Java  method 
implements  the  mod  operation  according  to  the  definition  in  Eqn. 

(F.l):3 

int  Mod(int  a,  int  b)  { 
if  (b  ==  0) 
return  a; 
if  (a  *  b  >=  0) 

return  a  -  b  *  (a  /  b) ; 
else 

return  a  -  b  *  (a  /  b  -  1) ; 

} 

Note  that  the  remainder  operator  %  defined  as 

a  °/0  b  =  a  —  b  •  truncate(a/5),  for  b  ^  0,  (F.2) 

is  often  used  in  this  context,  but  yields  the  same  results  only  for 
positive  operands  a  >  0  and  b  >  0.  For  example, 

13  mod  4=1  13  °/0 

13  mod  -4  =  -3  13  1 

-13  mod  4=3  VS'  -13  7. 

—  13  mod  —4  =  —  1  — 13  °/0 

F.l. 3  Unsigned  Byte  Data 

Most  grayscale  and  indexed  images  in  Java  and  Image J  are  composed 
of  pixels  of  type  byte,  and  the  same  holds  for  the  individual  compo¬ 
nents  of  most  color  images.  A  single  byte  consists  of  eight  bits  and 
can  thus  represent  28  =  256  different  bit  patterns  or  values,  usually 
mapped  to  the  numeric  range  0, . . . ,  255.  Unfortunately,  Java  (unlike 
C  and  C++)  does  not  provide  a  suitable  “unsigned”  8-bit  data  type. 

The  primitive  Java  type  byte  is  “signed”,  using  one  of  its  eight  bits 
for  the  ±  sign,  and  is  intended  to  hold  values  in  the  range  —128, . . . , 

+127. 

Java’s  byte  data  can  still  be  used  to  represent  the  values  0  to 
255,  but  conversions  must  take  place  to  perform  proper  arithmetic 
computations.  For  example,  after  execution  of  the  statements 

int  a  =  200; 
byte  b  =  (byte)  p; 

the  variables  a  (32-bit  int)  and  b  (8-bit  byte)  contain  the  binary 
patterns 

a  =  00000000000000000000000011001000 
b  =  11001000 

Interpreted  as  a  (signed)  byte  value,  with  the  leftmost  bit4 *  as  the 
sign  bit,  the  variable  b  has  the  decimal  value  —56.  Thus  after  the 
statement 

2  Starting  with  Java  version  1.8  the  mod  operation  (as  defined  in  Eqn. 

(F.l))  is  implemented  by  the  standard  method  Math. f loorMod(a,  b). 

3  The  definition  in  Eqn.  (F.l)  is  not  restricted  to  integer  operands. 

4  Java  uses  the  standard  “2s-complement”  representation,  where  a  sign 

bit  =  1  stands  for  a  negative  value. 


4  =  1 

-4  =  1 

4  =  -1 
-4  =  -1 
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Table  F.l 

Mathematical  meth¬ 
ods  and  constants  de¬ 
fined  by  Java’s  Math  class. 


int  al  =  b;  // al  ==  -56 

the  value  of  the  new  int  variable  al  is  —56  !  To  (ab-)use  signed  byte 
data  as  unsigned  data,  we  can  circumvent  Java’s  standard  conversion 
mechanism  by  disguising  the  content  of  b  as  a  logic  (i.e.,  nonarith¬ 
metic)  bit  pattern ;  for  example,  by 

int  a2  =  (Oxff  &  b)  ;  //  a2  ==  200 

where  Oxff  (in  hexadecimal  notation)  is  an  int  value  with  the  bi¬ 
nary  bit  pattern  00000000000000000000000011111111  and  &  is  the 
bitwise  AND  operator.  Now  the  variable  a2  contains  the  right  inte¬ 
ger  value  (200)  and  we  thus  have  a  way  to  use  Java’s  (signed)  byte 
data  type  for  storing  unsigned  values.  Within  Image J,  access  to  pixel 
data  is  routinely  implemented  in  this  way,  which  is  considerably  faster 
than  using  the  convenience  methods  getPixelO  and  putPixelO. 

F.l. 4  Mathematical  Functions  in  Class  Math 

Java  provides  most  standard  mathematical  functions  as  static  meth¬ 
ods  in  class  Math,  as  listed  in  Table  F.l.  The  Math  class  is  part  of 
the  java.lang  package  and  thus  requires  no  explicit  import  to  be 
used.  Most  Math  methods  accept  arguments  of  type  double  and  also 
return  values  of  type  double.  As  a  simple  example,  a  typical  use  of 
the  cosine  function  y  =  cos(x)  is 

double  x; 

double  y  =  Math. cos (x); 

Similarly,  the  Math  class  defines  some  common  numerical  constants 
as  static  variables;  for  example,  the  value  of  7 r  could  be  obtained  by 

double  pi  =  Math. PI; 


double  abs (double  a) 

double  max (double  a,  double  b) 

int  abs (int  a) 

float  max(float  a,  float  b) 

float  abs  (float  a) 

int  max (int  a,  int  b) 

long  abs  (long  a) 

long  max (long  a,  long  b) 

double  ceil (double  a) 

double  min (double  a,  double  b) 

double  floor (double  a) 

float  min(float  a,  float  b) 

int  floorMod(int  a,  int  b) 

int  min (int  a,  int  b) 

long  f loorMod(long  a,  long  b) 

long  min(long  a,  long  b) 

double  rint (double  a) 

long  round (double  a) 

double  random () 

int  round(float  a) 

double  toDegrees (double  rad) 

double  toRadians (double  deg) 

double  sin (double  a) 

double  asin(double  a) 

double  cos (double  a) 

double  acos (double  a) 

double  tan (double  a) 

double  at an (double  a) 

double  atan2 (double  y,  double  x) 

double  log (double  a) 

double  exp (double  a) 

double  sqrt (double  a) 

double  pow (double  a,  double  b) 

double  E 

double  PI 
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F.l  Arithmetic 


Java’s  Math  class  (confusingly)  offers  three  different  methods  for 
rounding  floating-point  values: 

double  rint (double  x) 
long  round (double  x) 
int  round (float  x) 

For  example,  a  double  value  x  can  be  rounded  to  int  in  any  of  the 
following  ways: 

double  x;  int  k; 
k  =  (int)  Math . rint (x) ; 
k  =  (int)  Math. round (x) ; 
k  =  Math . round ( (float)  x) ; 

If  the  operand  x  is  known  to  be  positive  (as  is  typically  the  case 
with  pixel  values)  rounding  can  be  accomplished  without  using  any 
method  calls  by 

k  =  (int)  (x  +  0.5);  //only  if  x  >=  0 

In  this  case,  the  expression  (x  +  0 . 5)  is  first  computed  as  a  floating¬ 
point  (double)  value,  which  is  then  truncated  (toward  zero)  by  the 
explicit  (int)  typecast. 


F.l. 6  Inverse  Tangent  Function 

The  inverse  tangent  function  +  —  tan-1  (a)  or  +  —  arctan(a)  is  used 
in  several  places  in  the  main  text.  This  function  is  implemented 
by  the  method  at  an  (double  a)  in  Java’s  Math  class  (Table  F.l). 
The  return  value  of  atan()  is  in  the  range  [—  . . . ,  and  thus  re¬ 

stricted  to  only  two  of  the  four  quadrants.  Without  any  additional 
constraints,  the  resulting  angle  is  ambiguous.  In  many  practical  sit¬ 
uations,  however,  a  is  given  as  the  ratio  of  two  catheti  (Ax,  Ay)  of  a 
right-angled  triangle  in  the  form 

cp  =  arctan(|),  (F.3) 

for  which  we  introduced  the  two-parameter  function 

(p  =  ArcTan(x,  y)  (F.4) 

in  the  main  text.  The  function  ArcTan(x,  y)  is  implemented  by  the 
standard  method  atan2(dy,dx)  in  Java’s  Math  class  (note  the  re¬ 
versed  parameters  though)  and  returns  an  unambiguous  angle  p>  in 
the  range  [—7 r, ...  ,7 r] ;  that  is,  in  any  of  the  four  quadrants  of  the  unit 
circle.5  Also,  the  atan2()  method  returns  a  useful  value  even  if  both 
arguments  are  zero. 

5  The  function  atan2(dy,dx)  is  available  in  most  current  programming 
languages,  including  Java,  C,  and  C++. 
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F.1.7  Classes  Float  and  Double 

The  representation  of  floating-point  numbers  in  Java  follows  the 
IEEE  standard,  and  thus  the  types  float  and  double  include  the 
values 


Float .MIN_VALUE, 

Float .MAX_VALUE, 

Float . POS ITT  VE_ INFINITY, 
Float . NEGATIVE_INFINITY, 
Float . NaN, 


Double. MIN_VALUE, 

Double. MAX_VALUE, 

Double . POSITIVE_INFINITY, 
Double . NEGATIVE_INFINITY, 
Double . NaN. 


These  values  are  defined  as  constants  in  the  corresponding  wrapper 
classes  Float  and  Double,  respectively.  If  any  INFINITY  or  NaN6 
value  occurs  in  the  course  of  a  computation  (e.g.,  as  the  result  of  di¬ 
viding  by  zero),7  Java  continues  without  raising  an  error,  so  incorrect 
values  may  ripple  through  a  whole  chain  of  calculations,  making  the 
actual  bugs  difficult  to  locate. 


F.1.8  Testing  Floating-Point  Values  Against  Zero 

Comparing  floating-point  values  or  testing  them  for  zero  is  a  non¬ 
trivial  issue  and  a  frequent  cause  of  errors.  In  particular,  one  should 
never  write 


if  (x  ==  0.0)  {...}  ^problem! 

if  x  is  a  floating-point  variable.  This  is  often  needed,  for  example, 
to  make  sure  that  it  is  safe  to  divide  another  quantity  by  x.  The 
aforementioned  test,  however,  is  not  sufficient  since  x  may  be  non¬ 
zero  but  still  too  small  as  a  divisor. 

A  much  better  alternative  is  to  test  if  x  is  “close”  to  zero,  that 
is,  within  some  small  positive/negative  ( epsilon )  interval.  While  the 
proper  choice  of  this  interval  depends  on  the  specific  situation,  the 
following  settings  are  usually  sufficient  for  safe  operation:8 * * 

static  final  float  EPSIL0N_FL0AT  =  le-7f; 

static  final  double  EPSIL0N_D0UBLE  =  2e-16; 

float  x; 

double  y; 

if  (Math . abs (x)  <  EPSIL0N_FL0AT )  { 

...  //  x  is  practically  zero 

} 

if  (Math. abs (y)  <  EPSIL0N_D0UBLE)  { 

...  //  y  is  practically  zero 

} 


NaN  stands  for  “not  a  number”. 

7  In  Java,  this  only  holds  for  floating-point  operations,  whereas  integer 
division  by  zero  always  causes  an  exception. 

8  These  settings  account  for  the  limited  machine  accuracy  (em)  of  the 

IEEE  754  standard  types  float  (em  ~  1.19  •  10-7)  and  double  (em  ~ 

2.22  •  10-16)  [190,  Ch.  1,  Sec.  1.1.2]. 
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F.2  Arrays  in  Java 

F.2.1  Creating  Arrays 

Unlike  in  most  traditional  programming  languages  (such  as  FOR¬ 
TRAN  or  C),  arrays  in  Java  can  be  created  dynamically ,  meaning 
that  the  size  of  an  array  can  be  specified  at  runtime  using  the  value 
of  some  variable  or  arithmetic  expression.  For  example: 

int  N  =  20; 

int  []  A  =  new  int  [N]  ; 

int  []  B  =  new  int  [N  *  N]  ; 

Once  allocated,  however,  the  size  of  any  Java  array  is  fixed  and  cannot 
be  subsequently  altered.9  Note  that  Java  arrays  may  be  of  length 
zero\ 

After  its  definition,  an  array  variable  can  be  assigned  any  other 
compatible  array  or  the  constant  value  null,  for  example,  10 

A  =  B;  //A  now  references  the  data  in  B 

B  =  null; 

With  the  assignment  A  =  B,  the  array  initially  referenced  by  A  be¬ 
comes  unaccessible  and  thus  turns  into  garbage.  In  contrast  to  C  and 
C++,  where  unnecessary  storage  needs  to  be  deallocated  explicitly, 
this  is  taken  care  of  in  Java  by  its  built-in  “garbage  collector”.  It  is 
also  convenient  that  newly  created  arrays  of  numerical  element  types 
(int,  float,  double,  etc.)  are  automatically  initialized  to  zero. 

F.2. 2  Array  Size 

Since  an  array  may  be  created  dynamically,  it  is  important  that  its 
actual  size  can  be  determined  at  runtime.  This  is  done  by  accessing 
the  length  attribute* 11 

int  k  =  A. length;  //number  of  elements  in  A 

The  size  is  a  property  of  the  array  itself  and  can  therefore  be  obtained 
inside  any  method  from  array  arguments  passed  to  it.  Thus  (unlike 
in  C,  for  example)  it  is  not  necessary  to  pass  the  size  of  an  array  as 
a  separate  function  argument. 

If  an  array  has  more  than  one  dimension,  the  size  (length)  along 
every  dimension  must  be  queried  separately  (see  Sec.  F.2. 4).  Also 
arrays  are  not  necessarily  rectangular;  for  example,  the  rows  of  a  2D 
array  may  have  different  lengths  (including  zero). 

F.2. 3  Accessing  Array  Elements 

In  Java,  the  index  of  the  first  array  element  is  always  0  and  the  index 
of  the  last  element  is  A— 1  for  an  array  with  a  total  of  N  elements. 
To  iterate  through  a  ID  array  A  of  arbitrary  size,  one  would  typically 
use  a  construct  like 

9  For  additional  flexibility,  Java  provides  a  number  of  universal  container 
classes  (e.g.,  the  classes  Set  and  List)  for  a  wide  range  of  applications. 

10  This  is  not  possible  if  the  array  variable  was  defined  with  the  final 
attribute. 

11  Notice  that  the  length  attribute  of  an  array  is  not  a  method! 
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for  (int  i  =  0;  i  <  A. length;  i++)  { 

//  do  something  with  A  [i] 

} 

Alternatively,  if  only  the  array  values  are  relevant  and  the  array 
index  (i)  is  not  needed,  one  could  use  to  following  (even  simpler) 
loop  construct: 

for  (int  a  :  A)  { 

//  do  something  with  array  values  a 

} 

In  both  cases,  the  Java  compiler  can  generate  very  efficient  runtime 
code,  since  the  source  code  makes  obvious  that  the  for  loop  does 
not  access  any  elements  outside  the  array  limits  and  thus  no  explicit 
boundary  checking  is  needed  at  execution  time.  This  fact  is  very 
important  for  implementing  efficient  image  processing  programs  in 
Java. 

Images  in  Java  and  Image J  are  usually  stored  as  ID  arrays  (acces¬ 
sible  through  the  ImageProcessor  method  getPixelsO  in  ImageJ), 
with  pixels  arranged  in  row- first  order.12  Statistical  calculations  and 
most  point  operations  can  thus  be  efficiently  implemented  by  directly 
accessing  the  underlying  ID  array.  For  example,  the  run  method  of 
the  contrast  enhancement  plugin  in  Prog.  4.1  (see  Chapter  4,  p.  58) 
could  also  be  implemented  in  the  following  manner: 

public  void  run (ImageProcessor  ip)  { 

//  ip  is  assumed  to  be  of  type  ByteProcessor 
byte  []  pixels  =  (byte  []  )  ip .  getPixels  ()  ; 
for  (int  i  =  0;  i  <  pixels . length;  i++)  { 

int  a  =  OxFF  &  pixels  [i]  ;  //  direct  read  operation 

int  b  =  (int)  (a  *  1.5  +  0.5); 
if  (b  >  255) 
b  =  255; 

pixels  [i]  =  (byte)  (OxFF  &  b) ;  //  direct  write  operation 

} 

} 


F.2.4  2D  Arrays 

Multidimensional  arrays  are  a  frequent  source  of  confusion.  In  Java, 
all  arrays  are  ID  in  principle,  and  multi-dimensional  arrays  are  im¬ 
plemented  as  ID  arrays  of  arrays  etc.  (see  Fig.  F.l).  If,  for  example, 
the  3x3  matrix 


a0,0  a0,l  a0,2 

"1  2  3" 

A  = 

al,0  al,l  al,2 
_a2,0  a2,l  a2,2_ 

— 

4  5  6 
7  8  9 

is  defined  as  a  2D  int  array, 

int  []  []  A  =  {{1,2,3}, 

{4,5,6}, 

{7,8,9}}; 


(F.5) 
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This  means  that  horizontally  adjacent  image  pixels  are  stored  next  to 
each  other  in  computer  memory. 


A 


F.2  Arrays  in  Java 


Fig.  F.l 

Layout  of  elements  of  a  2D 
Java  array  (corresponding  to 
Eqn.  (F.5)).  In  Java,  multidi¬ 
mensional  arrays  are  generally 
implemented  as  ID  arrays 
whose  elements  are  again  ID 
arrays. 


then  A  is  actually  a  ID  array  with  three  elements,  each  of  which  is 
again  a  ID  array.  The  elements  A[0],  A[l]  and  A  [2]  are  of  type 
int  []  and  correspond  to  the  three  rows  of  the  matrix  A  (see  Fig. 
F.l). 

The  usual  assumption  is  that  the  array  elements  are  arranged 
in  row-first  order,  as  illustrated  in  Fig.  F.l.  The  first  index  thus 
corresponds  to  the  row  number  r  and  the  second  index  corresponds 
to  the  column  number  c,  that  is, 

ar  c  =  A  [r]  [c]  .  (F.6) 

This  conforms  to  the  mathematical  convention  and  makes  the  array 
definition  in  the  code  segment  above  look  exactly  the  same  as  the 
original  matrix  in  Eqn.  (F.5).  Note  that  in  this  scheme  the  first 
array  index  corresponds  to  the  vertical  coordinate  and  the  second 
index  to  the  horizontal  coordinate. 

However,  if  an  array  is  used  to  specify  the  contents  of  an  image 
I{u ,  v)  or  a  filter  kernel  H(i ,  j),  we  usually  assume  that  the  first  index 
(u  or  i,  respectively)  is  associated  with  the  horizontal  ^-coordinate 
and  the  second  index  (v  bzw.  j )  with  the  vertical  //-coordinate.  For 
example,  if  we  represent  the  filter  kernel 


^0,0  ^1,0  ^2,0 

"-1  -2  0" 

H  = 

^0,1  ^1,1  ^2,1 

— 

-2  0  2 

_^0,2  ^1,2  ^2,2_ 

0  2  1 

as  a  2D  Java  array, 

double  [][]  H  =  {{-1,-2,  0}, 

{-2,  0,  2}, 

{  0,  2,  1}}; 

then  the  row  and  column  indexes  must  be  reversed  in  order  to  access 
the  correct  elements.  In  this  case  we  have  the  relation 

hid  =  H  [j]  [i] ,  (F.7) 

that  is,  the  ordering  of  the  indexes  for  array  H  is  not  the  same  as  for 
the  i/j  coordinates  of  the  filter  kernel.  In  this  case  the  first  array 
index  ( j )  corresponds  to  the  vertical  coordinate  and  the  second  index 
(i)  to  the  horizontal  coordinate.  The  advantage  is  that  (as  shown  in 
the  aforementioned  code  segment)  the  definition  of  the  filter  kernel 
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can  be  written  in  the  usual  matrix  form13  (otherwise  we  would  have 
to  specify  the  transposed  kernel  matrix). 

If  a  2D  array  is  merely  used  as  an  image  container  (whose  con¬ 
tents  are  never  defined  in  matrix  form)  any  convention  can  be  used 
for  the  ordering  of  the  indexes.  For  example,  the  ImageJ  method 
getFloat Array  ()  of  class  ImageProcessor,  when  called  in  the  form 

f  loat  []  []  I  =  ip .  getFloatArray  ()  ; 

returns  the  image  as  a  2D  array  (I),  whose  indexes  are  arranged  in 
the  usual  x/y  order,  that  is, 

I(x,y)  =  I  Mfy].  (F.8) 

In  this  case,  the  image  pixels  are  arranged  in  column-order,  that  is, 
vertically  adjacent  elements  are  stored  next  to  each  other  in  memory. 

Size  of  multi-dimensional  arrays 

The  size  of  a  multi-dimensional  array  can  be  obtained  by  querying 
the  size  of  its  sub- arrays.  For  example,  given  the  following  3D  array 
with  dimensions  PxQxR , 

int  A  []  []  []  =  new  int  [P]  [Q]  [R]  ; 

the  size  of  A  along  its  three  dimensions  is  obtained  by  the  statements 

int  p  =  A .  length ;  //  =  P 

int  q  =  A  [0].  length;  // =  Q 

int  r  =  A  [0]  [0]  .  length ;  //  =  R 

This  at  least  works  for  “rectangular”  Java  arrays,  that  is,  multi¬ 
dimensional  arrays  with  all  sub-arrays  at  the  same  level  having  iden¬ 
tical  lengths,  which  is  warranted  by  the  array  initialization  in  the 
aforementioned  case.  However,  every  ID  sub-array  of  A  may  be  re¬ 
placed  by  a  suitable  ID  array  of  different  length,14  for  example,  by 
the  statement 

A  [0]  [0]  =  new  int  [0]  ; 

To  avoid  “index-out-of-bounds”  errors,  the  length  of  each  sub-array 
should  be  determined  dynamically.  The  following  example  shows  a 
“bullet-proof”  iteration  over  all  elements  of  a  3D  array  A  whose  sub¬ 
arrays  may  have  different  lengths  or  may  even  be  empty: 

int  A  []  []  []  ; 

•  •  • 

for  (int  i  =  0;  i  <  A. length;  i++)  { 

for  (int  j  =  0;  j  <  A [i] . length;  j++)  { 

for  (int  k  =  0;  k  <  A [i] [j] .length;  k++)  { 

//safely  access  A [i]  [j]  [k] 

} 

} 

} 


13  This  scheme  is  used,  for  example,  in  the  implementation  of  the  3x3 
filter  plugin  in  Prog.  5.2  (Chapter  5,  p.  95). 

14  Even  if  the  array  A  was  originally  declared  final,  the  structure  and 
contents  of  its  sub-arrays  may  be  modified  any  time. 
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F.2  Arrays  in  Java 


In  Java,  as  mentioned  earlier,  we  can  create  arrays  dynamically;  that 
is,  the  size  of  an  array  can  be  specified  at  runtime.  This  is  convenient 
because  we  can  adapt  the  size  of  the  arrays  to  the  given  problem.  For 
example,  we  could  write 

Corner []  corners  =  new  Corner  [n] ; 

to  create  an  array  that  can  hold  n  objects  of  type  Corner  (as  defined 
in  Chapter  7,  Sec.  7.3).  Note  that  the  new  array  corners  is  not  filled 
with  corners  yet  but  initialized  with  null  references,  so  the  newly 
created  array  holds  no  objects  at  all.  We  can  insert  a  Corner  object 
into  its  first  (or  any  other)  cell,  for  example,  by 

corners  [0]  =  new  Corner (10,  20,  6789. Of); 


F.2. 6  Searching  for  Minimum  and  Maximum  Values 

Unfortunately,  the  standard  Java  API  does  not  provide  methods  for 
retrieving  the  minimum  and  maximum  values  of  a  numeric  array. 
Although  these  values  are  easily  found  by  iterating  over  all  elements 
of  the  sequence,  care  must  be  taken  regarding  the  initialization. 

For  example,  finding  the  extreme  values  of  a  sequence  of  int- 
values  could  be  accomplished  as  follows:15 

int  []  A  =  ... 

int  minval  =  Integer .MAX_ VALUE; 
int  maxval  =  Integer . MIN_VALUE ; 
for  (int  val  :  A)  { 

minval  =  Math .min (minval ,  val); 
maxval  =  Math . max (maxval ,  val); 

} 

Note  the  use  of  the  constants  MIN_VALUE  and  MAX_VALUE,  which  are 
defined  for  any  numeric  Java  type. 

However,  in  the  case  of  floating-point  values,  these  are  not  the 
proper  values  for  initialization.16  Instead,  POSITIVE_INFINITY  and 
NEGATIVE_ INFINITY  should  be  used,  as  shown  in  the  following  code 
segment: 

double  []  B  =  ... 

double  minval  =  Double . POSITIVE_INFINITY ; 
double  maxval  =  Double . NEGATIVE_INFINITY ; 
for  (double  val  :  B)  { 

minval  =  Math .min (minval ,  val); 
maxval  =  Math . max (maxval ,  val); 

} 


15  Alternatively,  one  could  initialize  minval  and  maxval  with  the  first  array 
element  A  [0] . 

16  Because  Double .  MIN_VALUE  and  Float .  MIN_VALUE  specify  to  the  small¬ 
est  positive  values. 
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Appendix  F  F-2-7  Sorting  Arrays 

Java  Notes  Arrayg  can  be  sorted  efficiently  with  the  standard  method 


Arrays  .  sort  ( type  []  arr) 

in  class  java. util  .Arrays,  where  arr  can  be  any  array  of  primitive 
type  (int,  float,  etc.)  or  an  array  of  objects.  In  the  latter  case,  the 
array  may  not  have  null  entries.  Also,  the  class  of  every  contained 
object  must  implement  the  Comparable  interface,  that  is,  provide  a 
public  method  compareToO  that  returns  an  int  value  of  —1,  0,  or 
1,  depending  upon  the  intended  ordering  relation.  For  example,  the 
class  Corner  defines  the  compareToO  method  as  follows: 

public  class  Corner  implements  Comparable <Corner>  { 
float  x,  y,  q; 

•  •  • 

public  int  compareTo (Corner  other)  { 
if  (this.q  >  other. q)  return  -1; 
else  if  (this.q  <  other. q)  return  1; 
else  return  0; 

} 

} 
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417,  714 

*,  100-102,  125,  283,  490,  541, 
616,  714,  739 
®,  568,  714 
<8>,  714,  723,  751 
x,  714 
©,  185,  714 
©,  186,  714 
o,  714 

•  ,  714 

d ,  123,  397,  715,  736,  737 
V,  123,  392,  397,  442-444,  715, 
736 

V2,  139,  434,  611,  715,  738,  763 

713,  714 

U,  717 
n,  717 
\,  717 

•  •  • ,  714 

•  •  • ,  714 
A,  715 

V,  715 

714,  756 
«,  714 

=,  714 
714 
A  714 
:=,  714 

I  |,  714,  717 

II  II,  714 
H,714 
LJ,  714 
0,  715 

/i,  716,  749,  756 
cr,  716 
r,  716 

&  (operator),  768 
I  (operator),  296 
/  (operator),  714 
“/(operator),  767 
&  (operator),  296 
»  (operator),  296 
«  (operator),  296 


A 

abs  (method),  84,  768 
absolute  value,  714 
accumulator,  164 
achromatic,  308 
acos  (method),  768 
Adapt iveThresholder  (class),  284, 
286 

AdaptiveThresholdGauss  (alg.),  285 
ADD  (constant),  85 
add  (method),  84,  157 
addChoice  (method),  88 
addGaussianNoise  (method),  758, 
759 

addNumericField  (method),  88 
adj,  715 

adjugate  matrix,  521,  715 
Adobe 

Illustrator,  12 

Photoshop,  63,  96,  116,  143 
RGB,  354 
affine 

combination,  369 
mapping,  515-517,  526 
AffineMapping  (class),  532,  604 
aggregate  distance,  379 
trimmed,  385 

aliasing,  468,  472,  475,  476,  487, 
556 

alpha 

channel,  14,  296 
value,  85,  296 
ambient  lighting,  345 
amplitude,  454,  455 
Analyze  (menu),  35 
AND  (constant),  84 
and,  197,  715 

angleFromlndex  (method),  175 
angular  frequency,  454,  472,  476, 
482 

anisotropic  diffusion,  433-448 
Apache  Commons  Math  library, 
696,  727-729,  731 
applyTable  (method),  71,  79,  80, 
83 
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applyTo  (method),  200,  385,  389, 
449,  532-534,  537,  606 
approximation,  547,  548 
ArcTan,  236,  715,  769 
area 

polygon,  231 
region,  231 

arithmetic  operation,  84 
array 
ID,  771 
2D,  772 

accessing  elements,  771 
creation,  771 
in  Java,  771 
size,  771 
sorting,  776 
ArrayList  (class),  155 
Arrays  (class),  324,  776 
ARToolkit,  173 
asin  (method),  768 
associativity,  186 
atan  (method),  768 
atan2  (method),  715,  768,  769 
auto-contrast,  61 
modified,  62 
AVERAGE  (constant),  85 
AVI,  608,  664 
AWT,  296,  360 

B 

background,  181,  254 
BackgroundMode  (class),  286 
bandwidth,  468,  620,  623,  762 
Bartlett  window,  492,  494,  495 
basis  function,  471-475,  481,  487, 
503,  504,  510 

Bayesian  decision  making,  268 
BeanShell,  34 

Bernsen  thresholding,  274-275 

BernsenThreshold  (alg.),  275 
BernsenThresholder  (class),  287 
bias,  171,  750,  752 
bicubic  interpolation,  553 
Bicubiclnterpolator  (class),  560, 
561 

big  endian,  19,  20 
bilateral  filter,  420-432 
color,  424 
Gaussian,  423 
separable,  428 

BilateralFilter  (class),  449 
BilateralFilterColor  (alg.),  428 
BilateralFilterG ray  (alg.),  424 
BilateralFilterGraySeparable  (alg.), 
432 


BilateralFilterSeparable 

(class),  449 
bilinear 

interpolation,  551 
mapping,  525,  526 

Bilinear  Interpolator  (class), 

534,  560 

BilinearMapping  (class),  533 
binarization,  59,  253 
binary 
code,  195 

image,  11,  132,  181,  209 
morphology,  181 
value,  19 

BinaryMorphologyFilter  (class), 
198-200 

BinaryMorphologyFilter . Box 

(class),  200 

BinaryMorphologyFilter . Disk 

(class),  200 

BinaryProcessor  (class),  59 
BinaryRegion  (class),  224,  246 
binning,  45-47,  54 
bit 

depth,  9 
mask,  296 
operation,  297 
bitmap  image,  11,  225 
bitwise  AND  operator,  768 
black  box,  101 

black-generation  function,  322 
blending,  85 

Blitter  (interface),  84,  85,  88,  145 
blob,  624 
block  sum 
first-order,  52 
second-order,  53 
blur 

filter,  89,  90 
Gaussian,  115 
blur  (method),  284 
blurFloat  (method),  284,  287 
blurGaussian  (method),  115,  284 
BMP,  18,  20,  299 
border  handling,  282 
boundary,  665 
pixels,  280 

bounding  box,  218,  231,  232,  239, 
241 

box  filter,  93,  103,  125,  283,  415 
Bradford  model,  356,  359 

Bradf  ordAdapt  at  ion  (class),  363 
breadth- first,  212 

BreadthFirstLabeling  (class),  246 
Brent’s  method,  696 
Brent  Optimizer  (class),  696 
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Bresenham  algorithm,  177 
brightness,  58,  263 

BuildGaussianScaleSpace  (alg.),  624 
BuildSiftScaleSpace  (alg.),  631 
byte,  19 

byte  (type),  767 

ByteProcessor  (class),  56,  84,  276, 
289,  301,  709 

c 

C,  715 

camera  obscura,  4 
Canny  edge  operator,  132-138, 
404-406 

color,  404-406,  410 
grayscale,  410 

CannyEdgeDetector  (alg.),  135 
CannyEdgeDetect or  (class),  138, 
410,  411 

card,  38,  714,  715,  717 
cardinal  spline,  546 
cardinality,  714,  715,  717 
cascaded  Gaussian  filters,  616,  761 
Catmull-Rom  interpolation,  546 
CCITT,  12 

cdf,  see  cumulative  distribution 
function 
ceil,  714 

ceil  (method),  768 
center  line  detection,  194 
centralMoment  (method),  235 
centroid,  218,  233,  241,  673,  676, 
749 

CGM  format,  12 
chain  code,  226,  231 
chamfer 

algorithm,  577 
matching,  580 
Chamfer  Matcher  (class),  585 
characteristic  equation,  724 
Cholesky  decomposition,  755 
CholeskyDecomposition  (class), 
755 

chord  algorithm,  255 
chroma,  319 

chromatic  adaptation,  355 
Bradford  model,  356,  359 
XYZ  scaling,  355 
Chromatic  Adapt  at  ion  (class),  363 
chromaticity  diagram,  365 
CIE,  341 

chromaticity  diagram,  342,  345 
L*a*b*,  323,  346,  347 
LAB,  346 

standard  illuminant,  344 


XYZ,  342,  346,  347,  352,  353 
361 

CIELAB,  289,  381,  440 
CIELUV,  348,  381,  440 
circle,  176,  519,  674,  675 
circular  component,  328,  374 
circularity,  231 
circumference,  230 
city  block  distance,  577 
clamping,  58,  83,  94 
clone  (method),  324 
close  (method),  200 
closing,  192,  203 
clutter,  581 
CMYK,  320-323 
collectCorners  (method),  156 
Collections  (class),  157 
collinear,  733 
collision,  216 

Color  (class),  309-311,  360 
color 

covariance  matrix,  418 
difference,  350 
edge,  370,  391-410 
edge  magnitude,  399 
edge  orientation,  401 
filter,  367-389,  424,  438 
image,  11,  291-328 
keying,  316 
linear  mixture,  370 
management,  362 
out-of-gamut,  372 
picker,  328 
pixel,  294,  296 
saturation,  306 
space,  370-374 
table,  295,  299,  300,  326 
temperature,  344 
thresholding,  289 
color  quantization,  43,  295,  301 
329-338 
3:3:2,  330 
median-cut,  332 
octree,  333 
populosity,  331 
color  space,  303 
CMYK,  320 
colorimetric,  341-365 
HLS,  307 
HSB,  306,  361 
HSV,  306,  361 
in  Java,  358 
Kodak,  361 
LAB,  346 
LUV,  348 
RGB,  292 
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sRGB,  350 
XYZ,  342 
YCbCr,  319 
YIQ,  318 
YUV,  317 
color  system 
additive,  291 
subtractive,  320 

ColorCannyEdgeDetector  (alg.),  405 
ColorEdgeDetector  (class),  410 
ColorModel  (class),  300,  360 
ColorProcessor  (class),  296-299, 
302,  305,  324 

ColorQuantizer  (class),  337 
ColorSpace  (class),  359-361,  363 
column  vector,  720 
comb  function,  465 
commutativity,  186,  187 
compactness,  231 
Comparable  (interface),  776 
compareTo  (method),  155 
comparing  images,  565-584 
complementary  set,  184 
Complex  (class),  478,  705 
complex 

conjugate,  717 
number,  456,  717 
component 
histogram,  47 
ordering,  294 
compression,  42 
computeMatch  (method),  574 
computer 
graphics,  2 
vision,  3 

concatenation,  596,  714 
conditional  probability,  268,  757 
conductivity 
coefficient,  434 

function,  436,  438,  441,  442,  450 
conic  section,  519 
connected  components  problem, 
218 

container,  155 
Contour  (class),  224,  246 
contour,  131,  219-222 
contrast,  40,  58,  263 

automatic  adjustment,  61 
convertToByte  (method),  88,  145, 
224 

convertToByteProcessor 
(method),  305 
convertToColorProcessor 

(method),  158 

convertToFloat  (method),  145 


convertToFloatProcessor 

(method),  154,  281,  606,  662 
convex  hull,  232,  241,  249,  369 
convexity,  232,  245 
convolution,  100-102,  283,  284, 

368,  499,  568,  739 
associativity,  102 
commutativity,  101 
linearity,  101 
property,  463,  496 
convolve  (method),  115,  145 
Convolver  (class),  115,  145 
convolveX  (method),  154 
convolveXY  (method),  154 
convolveY  (method),  154 
coordinate 

homogeneous,  515-516,  726-727 
transformation,  514 
COPY  (constant),  85 
copyBits  (method),  84,  88,  145 
Corner  (class),  155 
corner,  147 

detection,  147-159 
point,  159 

response  function,  149,  152 
strength,  149 

CorrCoeff Matcher  (class),  574,  575 
correlation,  100,  499,  567 
coefficient,  569 
cos  (method),  768 
cosine  function,  461 
ID,  454 
2D,  483,  484 

cosine  transform,  15,  503-511 

c\ 

cosine  window,  494,  495 
countColors  (method),  324 
covariance,  749 

efficient  calculation,  750 
matrix,  238,  244,  249,  750 
covariance  matrix 
color,  418 

create  (method),  560 
createProcessor  (method),  562 
createRealMatrix  (method),  727, 
729 

createRealVector  (method),  727 

creating  new  images,  56 

cross 

correlation,  570 
product,  694,  723 
CRT,  292 

CS_CIEXYZ  (constant),  361 
CS_GRAY  (constant),  361 
CS_LINEAR_RGB  (constant),  361 
CS_PYCC  (constant),  361 
CS_sRGB  (constant),  361 


cubic 

interpolation,  544,  547 
spline,  546 
cumulative 

distribution  function,  67,  264 
histogram,  49,  63,  66,  67 
cycle  length,  454 

D 

D50,  345,  358,  361 
D65,  345,  347,  351 
dB  ,  see  decibel 
DCT,  503-511 
ID,  503-504 
2D,  504-509 

DCT  (method),  506,  509,  510 
Dct Id  (class),  509 
Dct2d  (class),  509 
debugging,  114 
decibel,  338 
Decimate  (alg.),  624 
decimated  scale,  637 
decimation,  622 
deconvolution,  500 
delta  function,  464 
depth  of  an  image,  9 
depth- first,  212 

DepthFirstLabeling  (class),  246 
derivative,  434 

estimation  from  discrete 
samples,  739 

first,  122,  150,  399,  610,  734,  736 
partial,  123,  397,  611,  715 
second,  130,  139,  611,  632 
desaturation,  306,  316 
selective,  317 
det,  714,  715 

determinant,  521,  635,  714,  715, 
724,  733,  745 

DFT,  469-501,  667-673,  715 
ID,  469-479 
2D,  481-501 
forward,  668 
inverse,  668 
periodicity,  670,  679 
spectrum,  668 
truncated,  672,  673,  679 
DFT  (method),  478 
Di  Zenzo/Cumani  algorithm,  402 
diameter,  232 
DICOM,  26 

DIFFERENCE  (constant),  85 
difference 
filter,  99 
set,  717 


difference-of-Gaussians  (DoG), 
613,  763 

differential  equation,  434 
diffusion  process,  434 
digital  image,  7 
dilate  (method),  200,  201 
dilation,  185,  203,  251 
dimension,  749 

Dirac  function,  104,  186,  460,  464 
direction  of  maximum  contrast, 
404 

directional  gradient,  398,  737 
discrete 

cosine  transform,  503-511 
Fourier  transform,  469-501,  715 
sine  transform,  503 
disk  filter,  283 
distance,  566,  716 
city  block,  577 
Mahalanobis,  243,  249 
Manhattan,  577 
mask,  578 

maximum  difference,  567 
norm,  382,  656,  660 
squared,  157 
sum  of  differences,  567 
sum  of  squared  differences,  567 
transform,  576 
weighted,  243 
distance  norm,  379 
distanceComplex  (method),  706 
distanceMagnitude  (method),  706 
DistanceTransf  orm  (class),  582, 
585 

distribution 

normal  (Gaussian),  756-758 
uniform,  54,  64,  66 
divergence,  434,  442,  737,  738 
DIVIDE  (constant),  85 

DiZenzoCumaniEdgeDetector 

(class),  410 

D0ES_8C  (constant),  300,  301 
D0ES_8G  (constant),  28,  44 
D0ES_ALL  (constant),  451 
D0ES_RGB  (constant),  297,  298 
D0ES_STACKS  (constant),  451 
domain,  716 
filter,  420 

dominant  orientation,  637,  640 
dot  product,  722,  728 
dotProduct  (method),  728 
dots  per  inch  (dpi),  8,  476 
Double  (class),  770 
double  (type),  95 
dpi,  476 

drawCorner  (method),  158 
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drawCorners  (method),  158 
drawLine  (method),  158,  179 
DST,  503 

duplicate  (method),  110,  145, 

281,  532 
DXF  format,  12 
dynamic  range,  40 

E 

E  (constant),  768 
e,  715 
e,  715 

eccentricity,  237,  250 
Eclipse,  31,  32 
edge 

direction,  134 
linking,  137 
localization,  134 
map,  131,  132,  161 
normal,  401 
orientation,  392,  403 
sharpening,  139-146 
strength,  149,  392 
suppression,  634 
tangent,  134,  392,  446 
tracing,  135 
edge  operator,  124-410 
Canny,  132-138,  404-406 
compass,  128 
in  Image J,  130 
Kirsch,  129 
LoG,  130,  133 
monochromatic,  392-395 
Prewitt,  125,  133 
Roberts,  127,  133 
Robinson,  128 
Sobel,  125,  128,  130,  133 
vector- valued  (color),  395-404 
edge-preserving  smoothing  filter, 
413-451 
Edit  (menu),  33 
effective  gamma  value,  81 
EigenDecomposit ion  (class),  729, 
753 

eigendecomposition,  753 
eigenpair,  724 
eigensystem,  446 

eigenvalue,  148,  149,  238,  399,  402, 
409,  446,  634,  723-726,  737, 
751 

ratio,  635 

eigenvector,  149,  400,  446, 

723-726,  737 
2x2  matrix,  724 
ellipse,  177,  238,  519,  677,  683 
parameters,  677 


elliptical  window,  493 
elongatedness,  237 
EMF  format,  12 

Encapsulated  PostScript  (EPS),  12 

entropy,  263,  264 

erode  (method),  200,  201 

erosion,  186,  203 

error  (method),  30 

Euclidean  distance,  157,  573 

Euler  number,  245 

Euler’s  notation,  456 

evidence,  269 

EXIF,  16,  351 

exp,  715 

exp  (method),  104,  768 
extract  Image  (method),  606 
extremum  of  a  function,  633 

F 

T,  715 
false,  715 

fast  Fourier  transform,  479,  484, 
498 

FastlsodataThreshold  (alg.),  260 
FastKuwaharaFilter  (alg.),  417 
fax  encoding,  226 
feature,  229 
vector,  242 

FFT,  496,  see  fast  Fourier 
transform,  668 
Fiji,  25 
hie  format 
BMP,  18 
EXIF,  16 
GIF,  13 
JFIF,  15 
JPEG-2000,  16 
magic  number,  20 
PBM,  18 
Photoshop,  20 
PNG,  14 
RAS,  19 
RGB,  19 
TGA,  19 
TIFF,  12-13 
XBM/XPM,  19 
fill  (method),  56 
filter,  89-118 

anisotropic  diffusion,  433-448 

bilateral,  420-432 

blur,  89,  90,  115 

border  handling,  92,  113 

box,  93,  98,  103,  125,  283,  415 

cascaded,  616 

color,  420,  424,  438 

color  image,  143,  367-389,  416 
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computation,  93 
debugging,  114 
derivative,  123 
difference,  99 
disk,  283 
domain,  420 
edge,  124-130 
edge-preserving  smoothing, 
413-451 
efficiency,  112 

Gaussian,  98,  103,  115,  134,  148, 
150,  283,  413,  423,  446,  610, 
617,  761-763 
HSV  color  space,  375 
ImageJ,  115-116 
impulse  response,  104 
in  frequency  space,  496 
indexed  image,  299 
inverse,  499 
jitter,  118 

kernel,  91,  100,  368,  392 
Kuwahara-type,  414-420 
Laplacian,  99,  117,  139,  145 
Laplacian-of-Gaussian,  610 
linear,  91-105,  115,  367-377,  739 
low-pass,  98,  284,  415,  623 
maximum,  105,  116,  207 
median,  107,  116,  181 
min/max,  281 
minimum,  105,  116,  207 
morphological,  181-208 
multi-dimensional,  379 
Nagao-Matsuyama,  415 
nonhomogeneous,  118 
nonlinear,  105-112,  116,  378-389 
normalized,  95 
Perona-Malik,  436-441 
range,  421 

scalar  median,  378,  388 
separable,  102,  103,  140,  284, 
613,  620 

sharpening  vector  median,  382 
smoothing,  94,  95,  98,  143,  368, 
370 

sombrero,  612 
successive  Gaussians,  616 
Tomita-Tsuji,  417 
Tschumperle-Deriche,  444-448 
unsharp  masking,  142 
vector  median,  378,  389 
weighted  median,  109 
final  (type),  771,  774 
Find_Corners  (plugin),  158 
Find_Straight_Lines  (plugin), 

173 

FindCommands  (menu),  33 


findCorners  (method),  157,  158  Index 

findEdges  (method),  130 

finite  differences,  434 

FITS,  26 

flat  image,  14 

Float  (class),  770 

floating-point  image,  11 

FloatProcessor  (class),  154 

flood  filling,  210-212 

floor,  714 

floor  (method),  768 
floorMod  (method),  767,  768 
Flusser’s  moments,  242 
foreground,  181,  254 
four-point  mapping,  519 
Fourier,  457 
analysis,  457 
coefficients,  457 
integral,  457 
series,  457 

shape  descriptor,  229,  665-711 
spectrum,  229,  458,  469 
transform,  454-501,  667-673, 

715,  762 

transform  pair,  459,  461,  462 
Fourier  descriptor,  665-711 
elliptical,  709 
from  polygon,  682 
geometric  effects,  687-692 
invariance,  692-700,  708 
Java  implementation,  704 
magnitude,  700 
matching,  700-704,  706 
normalization,  692-700,  707 
pair,  676-681 
phase,  690 

reconstruction,  668,  685 
reflection,  691 
start  point,  689 
trigonometric,  667,  682,  710 

FourierDescriptor  (class),  704 
FourierDescriptorFromPolygon  (alg.) , 

685 

FourierDescriptorFromPolygon 

(class),  707 

FourierDescriptorUniform  (alg.),  669, 

673 

FourierDescriptorUniform 

(class),  707 
frequency,  454,  476 
2D,  486 

angular,  454,  455,  472,  482 
common,  455 
directional,  487 
distribution,  67 
effective,  486,  487 
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fundamental,  457,  476 
maximum,  468,  487 
space,  459,  475,  496 
Frobenius  norm,  418,  751 
fromCIEXYZ  (method),  358-360 
fromRGB  (method),  364 
function 

basis,  471-475 
complex- valued,  666 
cosine,  454 
delta,  464 
Dirac,  460,  464 
distance,  700,  701 
gradient,  397 
hash,  701 
impulse,  460,  464 
Jacobian,  397 
partial  derivative,  397 
periodic,  454,  671 
scalar- valued,  735 
sine,  454 

trigonometric,  134 
vector- valued,  395,  735 
fundamental 

frequency,  457,  476 
period,  476 

G 

gamma  (method),  84 
gamma  correction,  74-82,  305, 
358,  361,  372 
applications,  78 
inverse,  82 
modified,  80-82,  352 
gamut,  321,  345,  351,  354 
garbage,  771 
Gaussian 

area  formula,  231 
component,  758 
derivative,  610 

distribution,  54,  258,  266,  268, 
269,  756,  758 

filter,  98,  103,  115,  148,  150, 
282,  423,  446,  610,  617, 
761-763 
filter  size,  103 
function,  460,  462 
kernel,  283 
mixture,  266 
noise,  758 
normalized,  284 
scale  space,  615,  761 
separable,  103 
successive,  616,  761 
weight,  638 
window,  492,  493,  495 


GaussianBlur  (class),  115,  145, 
284,  286,  287 

GaussianFilter  (class),  145 
GenericDialog  (class),  85,  86,  88, 
117 

GenericFilter  (class),  385,  389, 
449 

geometric  operation,  513-537 
get  (method),  29,  30,  58,  66,  113, 
307 

get2dHistogram  (method),  327 
get  Accumulator  (method),  174 
get  Accumulator  Image  (method), 
175 

get AccumulatorMax  (method),  175 
getAccumulatorMaxImage 
(method),  175 
get  Angle  (method),  176 
getBlues  (method),  301,  302 
getBounds  (method),  606 
getCoef f  icient  (method),  706 
getCoef f  icients  (method),  705 
getColorModel  (method),  300,  301 
getComponents  (method),  361 
getCornerPoints  (method),  606 
getCount  (method),  176 
getCovarianceMatrix  (method), 
752 

getData (method),  727 
getDistance  (method),  176 
getEdgeBinary  (method),  410 
getEdgeMagnitude  (method),  410 
getEdgeOrientation  (method), 

410 

getEdgeTraces  (method),  410 
getEigenvector  (method),  729 
getEntry  (method),  727 
getf  (method),  575,  576,  759 
getForegroundColor  (method), 

328 

getGreens  (method),  301,  302 
getHeight  (method),  29,  30,  759 
getHistogram  (method),  45,  56, 

66,  71,  289 

get  Image  (method),  30 
get InnerCont ours  (method),  224 
get Int Array  (method),  585 
getlnterpolatedValue  (method), 
560 

get  Inverse  (method),  532 
get  Iteration  (method),  606 
getLines  (method),  174 
getMapSize  (method),  300,  301 
getMatch  (method),  575,  585,  604, 
606,  607 

get  Match  Value  (method),  576,  585 
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getMaxCoef f icientPairs 

(method),  706 

getMaxNegHarmonic  (method),  706 
getMaxPosHarmonic  (method),  706 
getNextChoicelndex  (method),  88 
getNextNumber  (method),  88 
getOpenlmages  (method),  88 
getOuterContours  (method),  224 
GetPartialReconstruction  (alg.),  684 
getPix  (method),  562 
getPixel  (method),  29,  113,  298, 
768 

getPixels  (method),  154,  297,  772 
getPixelSize  (method),  301 
getPolygon  (method),  538 
getProcessor  (method),  30,  88 
getRadius  (method),  176 
getRealEigenvalues  (method), 

729 

getReconstruct ion  (method),  706 
getReconstructionPoint 
(method),  707 
getReds  (method),  301,  302 
getRef  erenceMappingTo  (method), 
604,  606 

getRef erencePoint  (method),  175, 
176 

getRef  erencePoint  s  (method), 

604 

getRegions  (method),  224 
getRmsError  (method),  604,  606 
getRoi  (method),  538,  606 
getShortTitle  (method),  56,  88 
getSif tFeatures  (method),  662 
getSolver  (method),  730,  731 
GetStartPointPhase  (alg.),  698 
getThreshold  (method),  286,  288 
getType  (method),  30,  606 
getWeightingFactors  (method), 
305 

getWidth  (method),  29,  30,  759 
GIF,  13,  20,  26,  43,  226,  295,  299 
GIMP,  447 
global  operation,  57 

GlobalThresholder  (class),  284 
grad,  715,  736 

gradient,  122,  123,  148,  150,  392, 
434,  436,  633,  715,  736,  738 
directional,  398,  736,  737 
magnitude,  133,  637,  638 
maximum  direction,  737 
multi-dimensional,  397 
orientation,  133,  637,  638 
scalar,  397,  401 
vector,  133,  134 
vector  held,  736 


graph,  208,  218 
GRAY8  (constant),  30 
grayscale 

conversion,  304,  353 
image,  10,  14 
morphology,  202 

GrayscaleEdgeDetector  (class), 
410 


H 

H,  715 
h,  715 

Hadamard  transform,  510 
Hanning  window,  491,  492,  494, 
495 

harmonic  number,  671 
Harris  corner  detector,  148,  636 

HarrisCornerDetector  (class),  158 
hasComplexEigenvalues  (method), 
729 

hasConverged  (method),  604,  606 

hash  function,  701 

HDTV,  319 

heat  equation,  434 

Hertz,  455,  476 

Hessian  matrix,  443-445,  447,  448, 
630,  632-634,  647,  715,  738, 
739,  743 

discrete  estimation,  445 
Hessian  normal  form,  165,  173 
hexadecimal,  19,  296,  768 
hierarchical  technique,  131 
histogram,  37-55,  324-325,  715 
binning,  45 
calculation,  43 
color  image,  46 
component,  47 
cumulative,  49,  63,  67 
equalization,  63 
matching,  70 
multiple  peaks,  640 
normalized,  67 
orientation,  637,  639 
smoothing,  639 
specification,  66-73 
HLS,  306,  307,  311-314,  316 
HLStoRGB  (method),  315 
horn,  715,  726 
homogeneous 

coordinate,  515-516,  715, 
726-727 

linear  equation,  724 
point  operation,  57,  64,  66 
region,  414 
homography,  524 
hot  spot,  91,  184 
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Hough  transform,  132,  161-180 
algorithm,  168 
bias,  171 

edge  strength,  171 
ellipse,  177 
for  circles,  176 
for  lines,  176 
generalized,  178 
hierarchical,  172 
implementation,  173 
HoughLine  (class),  176 
HoughTransf ormLines  (class),  173, 
174 

HSB,  see  HSV 

HSBtoRGB  (method),  311,  312,  361 
HSV,  289,  306,  309,  314,  316,  318, 
361 

HsvLinearFilter  (alg.),  377 
Hu’s  moments,  241 
Huffman  coding,  15 
hysteresis  thresholding,  134,  135 

I 

i,  456,  715,  717 
In,  716 
ICC,  358 
profile,  362 

ICC_ColorSpace  (class),  362 

ICC_Profile  (class),  362 

iconic  image,  14 

iDCT  (method),  506,  509,  510 

idempotent,  193 

identity  matrix,  442,  716,  724 

IJ  (class),  30 

IjUtils  (class),  88 

Illuminant  (enum-type),  363 

illuminant,  344 

image 

acquisition,  4 
analysis,  2 
binary,  11,  209 
bitmap,  11 
color,  11 
compression,  42 
coordinates,  9 
creating  new,  56 
defects,  41 
depth,  9,  11 
digital,  7 
display,  56 
hie  format,  11 
hat,  14 

boating- point,  11 
grayscale,  10,  14 
iconic,  14 

indexed  color,  11,  14,  294,  337 


inpainting,  447 
intensity,  10 
matching,  565-584 
padding,  114 
palette,  11 
plane,  5 
pyramid,  621 
raster,  12 
redisplay,  35 
size,  8 

space,  101,  496 
special,  11 
stack,  451,  664 
true  color,  14 
vector,  12 
warping,  526 

ImageAccessor  (class),  560-562 
ImageExtractor  (class),  606,  607 
Image  Interpolate  or  (class),  532 
Image J,  23-35 
debugging,  32 
hlter,  115-116 
geometric  operation,  531 
macro,  26,  31 
main  window,  26 
plugin,  26-31 
point  operation,  82-87 
program  structure,  26 
snapshot,  31 
stack,  25 
tutorial,  34 
undo,  26,  31 
website,  34 
Image J2,  25 

ImagePlus  (class),  29,  30,  56,  158, 
299,  302,  538 

ImageProcessor  (class),  27,  29,  30, 
297,  298,  300-302,  307,  772 
ImageStack  (class),  608 
imagingbook  library,  VIII,  33,  34 
ImgLib2,  25 
impulse,  450 

function,  104,  460,  464 
response,  104,  190 
in  place  processing,  483 
IndexColorModel  (class),  301-303 
indexed  color  image,  11,  14,  294, 
295,  299,  337 

initializeMatch  (method),  606 

insert  (method),  145 

int  (type),  35,  767 

integral  image,  51-53,  289,  560 

Integral  Image  (class),  53 

intensity 

histogram,  47 
image,  10 
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interest  point,  147,  610 
intermeans  algorithm,  258 
interpolation,  539-563,  594,  597 
ID,  539-549 
2D,  549-556 
B-spline,  546,  547 
bicubic,  553,  556 
bilinear,  551,  556 
by  convolution,  543 
Catmull-Rom,  545,  546 
cubic,  544 
ideal,  540 
kernel,  543 

Lanczos,  548,  554,  563 
Mitchell-Netravali,  546,  547 
nearest-neighbor,  543,  550,  556, 
557 

spline,  546 

InterpolationMethod  (class),  560, 
562 

intersection 

in  Hough  space,  168 
line,  173,  179 
set,  191,  717 

invariance,  231,  234,  241,  244,  565, 
692-700 
rotation,  696 
scale,  693 
start  point,  694 
inverse 
filter,  499 
matrix,  599,  720 
power  function,  77 
tangent  function,  769 
inverse  (method),  728 
inversion,  59 
invert  (method),  59,  84 
Isodata 

clustering,  258 
thresholding,  258-260 
IsodataThreshold  (alg.),  259 
IsodataThresholder  (class),  285 
isotropic,  90,  98,  123,  140,  141, 

148,  159,  188,  611 
iterateOnce  (method),  604,  606 
ITU601,  319 

ITU709,  78,  82,  305,  319,  328,  351 

j 

J,  716 

Jacobian  matrix,  397,  398,  716, 
736,  737 

Java 

applet,  25 
arithmetic,  765 
array,  771 


AWT,  27 
class  hie,  31 
compiler,  31,  772 
integer  division,  66,  765 
JVM,  20 

mathematical  functions,  768 
rounding,  769 
runtime  environment,  25 
virtual  machine,  20 
JavaScript,  34 
JBuilder,  31 
JFIF,  15,  18,  20 
jitter  filter,  118 
joint  probability,  757 
JPEG,  12,  14-18,  20,  26,  43,  226, 
295,  337,  351,  353,  508,  509 
JPEG-2000,  16 

K 

k-d  algorithm,  659 
kernel,  100 
key  point 

position  refinement,  632 
selection,  630 

Kimia  image  dataset,  242,  250, 
686,  711 

Kirsch  operator,  129 
Kodak  Photo  YCC  color  space, 
361 

kriging,  289 

Kronecker  product,  723 
Kuwahara-type  filter,  414-420 

KuwaharaFilter  (alg.),  416 
KuwaharaFilter  (class),  449 
KuwaharaFilterColor  (alg.),  418 

L 

LAB,  346 

LabColorSpace  (class),  359,  363, 
364 
label,  210 

Lanczos  interpolation,  548,  554, 
563 

Lanczoslnterpolator  (class),  560 
Laplacian,  99,  434,  435,  444 
filter,  99,  139,  141,  145 
operator,  139,  611,  738 
Laplacian-of-Gaussian,  117,  610 
approximation  by  difference  of 
Gaussians,  613,  763 
normalized,  612 

left-sided  vector-matrix  product, 
721,  728 
Lena,  107 
lens,  6 

likelihood,  757 
line 
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endpoints,  172 
equation,  162,  165 
Hessian  normal  form,  165 
intercept/slope  form,  162 
intersection,  173 
linear 

blending,  85,  88 
convolution,  100-102 
correlation,  100 
equation,  723,  724 
transformation,  521 
linearity,  463 
lines  per  inch  (lpi),  8 
LinkedList  (class),  212 
List  (class),  771 
list,  713 

concatenation,  714 
little  endian,  19,  20 
local 

extremum,  630,  734 
mapping,  528 

structure  matrix,  148,  400,  402, 
445 

lock  (method),  33 
LoG 

filter,  117 
operator,  133 
log  (method),  31,  84,  768 
log-polar  matching,  574 
long  (type),  35 
lookup  table,  82 
low-pass  filter,  284,  415 
LSB,  19 

Lucas-Kanade  matcher,  587-608 

LucasKanadeF  orwardMat  cher 
(class),  605-607 
LucasKanadelnverseMatcher 

(class),  605-607 

LucasKanadeMat cher  (class),  604, 
606 

LUDecomposit ion  (class),  730 
luma,  320,  354,  440 
luminance,  289,  304,  319,  320,  354, 
371,  440 
LUT,  200,  201 
LUV,  348 

LuvColorSpace  (class),  362 
LZW,  12,  13 

M 

machine  accuracy,  770 
macro  recorder,  33 
Macros  (menu),  34 
magic  number,  20 
magnitude,  714 

Mahalanobis  distance,  243,  249 


major  axis,  235 
makeCrf  (method),  154 
MakeDogOctave  (alg.),  631 
MakeGaussianKernel2D  (alg.),  285 
MakeGaussianOctave  (alg.),  624, 

631 

makeGaussKernelld  (method),  104, 
145 

makelndexColor Image  (method), 
301 

Makelnvariant  (alg.),  697 
makelnvariant  (method),  707,  710 
makeMapping  (method),  606 
MakeRotationlnvariant  (alg.),  697 
makeRotationlnvariant  (method), 
707 

MakeScalelnvariant  (alg.),  697 
makeScalelnvariant  (method), 

707 

MakeStartPointlnvariant  (alg.),  698 
makeStartPoint Invariant 

(method),  707 
makeTranslationlnvariant 
(method),  707 
Manhattan  distance,  577 
mapMultiply  (method),  728 
Mapping  (class),  533,  534 
mapping 

affine,  516,  517,  526 
bilinear,  525,  526 
four-point,  519 
linear,  521 
local,  528 
nonlinear,  526 
perspective,  520 
projective,  519-526 
ripple,  527 
spherical,  527 
three-point,  516 
twirl,  526 
mask,  142,  225 
MatchDescriptors  (alg.),  657 
mat  chDe  script  or  s  (method),  663 
matchHistograms  (method),  71 
matching,  700-704 
Math  (class),  768,  769 
matrix,  719,  731 
adjugate,  521,  715 
decomposition,  521,  731,  755 
Hessian,  443-445,  447,  448,  630, 
632-634,  647,  715,  738,  743 
identity,  442,  716,  724 
inverse,  599,  720,  728 
Jacobian,  397,  398,  716,  736,  737 
norm,  418,  751,  752 
rank,  716,  724 


802 


singular,  724 
symmetric,  725 
trace,  716 
transpose,  716,  720 
MatrixUtils  (class),  727 
MAX  (constant),  85,  116 
max  (method),  84,  768 
MaxEntropyThresholder  (class), 
285 

maximum 

entropy  thresholding,  263-266 
filter,  207,  281 
frequency,  468,  487 
likelihood  estimation,  756 
local  contrast,  399 
MaximumEntropyThreshold  (alg.), 
267 

mean,  50-51,  53,  255,  257,  279, 
414,  749,  756,  758,  759 
from  histogram,  50 
vector,  749 

MeanThresholder  (class),  285 
Measure  (menu),  35 
media-oriented  color,  353 
medial  axis  transform,  194 
MEDIAN  (constant),  116 
median,  51,  256 

filter,  107,  116,  181,  378 
filter  (weighted),  109 
median-cut  algorithm,  332 
MedianCut Quantizer  (class),  337, 
338 

MedianThresholder  (class),  285 
mesh  partitioning,  528 
Mexican  hat  filter,  99,  612 
mid-range,  257 
MIN  (constant),  85,  116 
min  (method),  84,  768 
MinErrorThresholder  (class),  285 
minimum  error  thresholding, 
266-272 

minimum  filter,  207,  281 
MinimumErrorThreshold  (alg.),  273 
Mitchell-Netravali  interpolation, 
547 

mixture  model,  758 
mod,  478,  716,  766 
mode,  756 

modified  auto-contrast,  62 
modulus,  see  mod 
moment,  226,  233-244 
central,  234 
Flusser,  242 
Hu,  241 
invariant,  241 
least  inertia,  235 


moment  (method),  235 
monochromatic  edge  detection, 
392-395 

MonochromaticColorEdge  (alg.),  395 
MonochromaticEdgeDetector 

(class),  410 
morphing,  529 

morphological  filter,  181-208 
binary,  181 
closing,  192,  203 
color,  202 
dilation,  185,  203 
erosion,  186,  203 
grayscale,  202 
opening,  192,  203 
outline,  189 
MPEG,  509 
MSB,  19 

mult  (method),  154 
multi-resolution  techniques,  131 
MultiGradientColorEdge  (alg.),  402 
MULTIPLY  (constant),  85 
multiply  (method),  84,  145,  728 
My_Inverter  (plugin),  29 

N 

N,  716 

J\f,  254,  269,  756,  759 
Nagao- Matsuyama  filter,  415 
NaN  (constant),  770 
nCentralMoment  (method),  235 
nearest-neighbor  interpolation,  543 
NearestNeighbor Interpolator 
(class),  560 

negative  frequency,  676 

NEGATIVE_INFINITY  (constant), 

770 

neighborhood,  210,  230 

2D,  274,  380,  383,  421,  422,  609, 
746 

3D,  630,  633 
square,  415 
NetBeans,  31,  32 
neutral 

element,  104,  186,  616 
point,  343 

nextGaussian  (method),  54,  759 
nextlnt  (method),  54 
Niblack  thresholding,  275-279 

NiblackThreshold  (alg.),  281 
NiblackThresholder  (class),  286, 
287 

NiblackThresholderGauss  (class), 
287 

NIH-Image,  25 
nil,  716 
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NO_CHANGES  (constant),  31,  44,  302 
noise,  159 
energy,  338 
Gaussian,  758 
reduction,  413 
nominal  gamma  value,  81 
non-maximum  suppression,  133, 
137,  169 

nonhomogeneous  filter,  118 
nonhomogeneous  operation,  57 
norm,  379,  393,  394,  396,  425,  716 
Euclidean,  714,  720 
Frobenius,  418,  751 
matrix,  418,  751,  752 
vector,  720 

normal  distribution,  54,  756 
normalization,  95 
normalized 
histogram,  67 
kernel,  284,  369 
NormType  (class),  389 
NTSC,  78,  317,  318 
null  (constant),  771 
Nyquist,  468,  487 

o 

OCR,  229,  245,  251,  279 
octave,  614,  617,  618,  621-624, 
628,  631,  642 
octree  algorithm,  333 
OctreeQuantizer  (class),  337 
open  (method),  200 
opening,  192,  203 
operate  (method),  728 
optical  axis,  5 
OR  (constant),  84 
orientation,  235,  486,  488 
dominant,  640 
histogram,  637 
orthogonal,  511 
oscillation,  454,  455 
Otsu’s  method,  260-263 
OtsuThreshold  (alg.),  262 
OtsuThresholder  (class),  285 
out-of-gamut  colors,  372 
outer  product,  103,  723,  728 
outerProduct  (method),  728 
outlier,  257 
outline,  189 

outline  (method),  200,  202 
OutOf BoundsStrategy  (class),  562 

P 

packed  ordering,  294-296 
padding,  114,  222 
PAL,  78,  317 
palette,  295,  299,  300 


image,  see  indexed  color  image 
parabolic  fitting,  733-735 
parameter  space,  163 
partial 

derivative,  123,  715 
differential  equation,  434 
Parzen  window,  491,  492,  494,  495 
pattern  recognition,  3,  229 
PDF,  12 

pdf,  see  probability  density 
function 
perimeter,  230 
period,  454 

periodicity,  454,  482,  486,  489 
Perona-Malik  filter,  436-441 
color,  438 
gray,  436 

Perona_Malik_Demo  (plugin),  451 
PeronaMalikColor  (alg.),  442 
PeronaMalikFilter  (class),  450 
PeronaMalikGray  (alg.),  438 
perspective 
image,  177 
mapping,  520 
projection,  5 

phase,  455,  477,  690,  694,  695,  699 
angle,  455 

Photoshop,  20,  378,  393 
PI  (constant),  768 
PICT  format,  12 
piecewise  linear  function,  68 
pinhole  camera,  4 
pipette  tool,  328 
pixel,  4 
value,  9 

Pixellnterpolator  (class),  532, 
534,  560,  561 
PKZIP,  14 
planar  ordering,  294 
Plessey  detector,  148 
Plugin  (interface),  27,  30,  33 
PluglnFilter  (class),  606 
PluglnFilter  (interface),  27,  29, 
33,  35,  297,  389 
PNG,  14,  20,  26,  299,  351 
point  operation,  57-87 
arithmetic,  82 
effects  on  histogram,  59 
gamma  correction,  74 
histogram  equalization,  63 
homogeneous,  83 
in  Image J,  82-87 
inversion,  59 
thresholding,  59 
point  set,  184 
point  spread  function,  105 
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Point2D  (class),  538 
polar  method,  758 
polygon,  667,  682 
area,  231 
path  length,  683 
uniform  sampling,  667,  710 
PolygonRoi  (class),  538 
PolygonSampler  (class),  708 
populosity  algorithm,  331 
positive  definite,  754 
POSITIVE_INFINITY  (constant), 
770 

posterior  probability,  268 
PostScript,  12 
pow  (method),  80,  768 
power  spectrum,  477,  485 
preMultiply  (method),  728 
Prewitt  operator,  125,  133 
primary  color,  292 
principal  curvature  ratio,  635 
print  pattern,  499 
prior  probability,  268,  273 
probability,  67,  756 
conditional,  268,  757 
density  function,  67,  264 
distribution,  67,  264 
joint,  757 
posterior,  268 
prior,  264,  268,  270,  273 
product 

cross,  714,  723 
dot,  722,  728 
matrix- vector,  721 
outer,  714,  723,  728 
scalar,  722,  728 
vector,  722-723 

profile  connection  space,  358,  361 
projection,  244,  250,  325,  722 
projective  mapping,  519-526 
Pro iectiveMapping  (class),  532, 
534,  537,  604,  606 
pseudo-perspective  mapping,  520 
pseudocolor,  326 
putPixel  (method),  29,  113,  298, 
768 

pyramid,  131,  621 

Q 

Q,  522,  525 

QR  decomposition,  521 
QRDecomposit ion  (class),  731 
quadratic  function,  632,  633,  640 
quadrilateral,  519,  716 
QuantileThreshold  (alg.),  257 
Quant ileThresholder  (class),  285 
quantization,  8,  59,  329-338 


linear,  330 
scalar,  329 
vector,  331 
quasi-separable,  613 

R 

R,  716 

radiusFromlndex  (method),  175 
Random  (class),  758,  759 
Random  (package) ,  54 
random 
image,  54 
process,  67 
variable,  67,  756 
random  (method),  54,  768 
range 

filter,  421 
rank,  716,  724 
rank  (method),  116,  281 
rank  ordering,  378 
RankFilters  (class),  116,  275,  276, 
281 

RAS  format,  19 
raster  image,  12 
RAW  format,  299 
RealMatrix  (class),  727,  729 
RealVector  (class),  727,  729 
Record  (menu),  34 
rectangular 
pulse,  460,  462 
window,  493 

RecursiveLabeling  (class),  246 
redisplaying  an  image,  35 
reflection,  185,  187 
refraction  index,  528 
region,  209-251 
area,  231,  234,  249 
centroid,  233,  249 
convex  hull,  232 
diameter,  232 
eccentricity,  237 
homogeneous,  414 
labeling,  210-219 
major  axis,  235 
matrix  representation,  225 
moment,  233 
orientation,  235 
perimeter,  230 
projection,  244 
run  length  encoding,  225 
topological  property,  244 
region  of  interest,  327,  536,  538, 
605,  606 

RegionContourLabeling  (class), 
224,  246 

RegionLabeling  (class),  246 
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relative  colorimetry,  355 
remainder  operator,  767 
resampling,  529 
resolution,  8 
RGB 

color  image,  291 
color  space,  292,  316 
format,  19 

RGBtoHLS  (method),  314 
RGBtoHSB  (method),  310,  361 
RGBtoHSV  (method),  311 
right-sided  vector-matrix  product, 
721,  728 

rint  (method),  768 

ripple  mapping,  527 

RippleMapping  (class),  533 

Roberts  operator,  127,  133 

Robinson  operator,  128 

Roi  (class),  538,  606 

Rotation  (class),  532,  533 

rotation,  241,  497,  513,  515,  688 

round,  84,  716 

round  (method),  80,  768 

rounding,  58,  84,  766,  769 

roundness,  231 

row  vector,  720 

run  (method),  27 

run  length  encoding,  225 

s 

Su  522,  525,  716 
saddle  point,  744 
sample,  749 
mean,  749 
variance,  749 
samplePolygonUnif ormly 
(method),  708 
sampling,  464-666 
frequency,  487 
interval,  466,  467 
spatial,  7 

theorem,  468,  473,  475,  487,  540 
time,  7 

saturation,  41,  306 
Sauvola  thresholding,  279 
SauvolaThresholder  (class),  287 
scalar 

held,  735-739 
median  filter,  378,  388 
product,  722,  728 

ScalarMedianFilter  (class),  386 
scalarMultiply  (method),  728 
scale 

absolute,  617,  621 
base,  621 
change,  688 


decimated,  637 
increment,  630 
initial,  617 
ratio,  617 
relative,  618 
scale  space,  610 

decimation,  621,  622 
discrete,  616 
Gaussian,  615 
hierarchical,  620,  623 
LoG/DoG,  619,  623 
octace,  621 
SIFT,  624-636 
spatial  position,  623 
sub-sampling,  621 
Scaling  (class),  532 
scaling,  241,  513,  515 
segmentation,  253,  289 
separability,  102,  117,  188,  284, 
507 

separable  filter,  99,  140,  613 
sequence,  713 

Sequent ialLabeling  (class),  246 
Set  (class),  771 
set,  184,  713 
difference,  717 
intersection,  717 
union,  717 

set  (method),  29,  30,  58,  66,  113, 
307 

setCoef f  icient  (method),  706 
setColor  (method),  158 
setColorModel  (method),  300, 
301,  303 

setEntry  (method),  727 
setf  (method),  759 
setNormalize  (method),  115,  145 
setPix  (method),  562 
setRGBWeights  (method),  305 
setup  (method),  27,  28,  31,  297, 
300,  411 

setValue  (method),  56 
Shah  function,  465 
Shannon,  468 
shape 

feature,  229 
number,  228,  249 
reconstruction,  668,  679,  681, 
684,  685,  706 
representation,  208 
rotation,  688 
sharpen  (method),  145 
sharpening  vector  median  filter, 
382 

SharpeningVectorMedianFilter  (alg.), 
384 
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Shear  (class),  532 
shearing,  515 
ShereMapping  (class),  533 
shift  property,  463 
ShortProcessor  (class),  289 
show  (method),  56,  158,  299 
showDialog  (method),  88 
SIFT,  609-664 

algorithm  summary,  647 
descriptor,  640-647 
examples,  654-657 
feature  matching,  648-660 
implementation,  634,  661-663 
parameters,  648 
scale  space,  624-636 
SiftDescriptor  (class),  662 
SiftDetector  (class),  662 
SiftMatcher  (class),  663 
signal 

energy,  338 
space,  101,  459,  475 
signal-to-noise  ratio,  338 
similarity,  463 
sin  (method),  768 
Sine  function,  460,  541,  550 
sine 

function,  454,  461 
transform,  503 

singular-value  decomposition,  731 
SingularValueDecomposition 
(class),  731 
size  (method),  706 
skeletonize  (method),  202,  208 
skew  angle,  251 
smoothing  filter,  91,  94,  283 
SNR,  338 

Sobel  operator,  125,  133,  392,  394 
extended,  128 
solve  (method),  730,  731 
sombrero  filter,  612 
sort  (method),  110,  157,  324,  776 
sorting  arrays,  776 
source-to-target  mapping,  530 
spatial  sampling,  7 
special  image,  11 
spectrum,  453 
spherical  mapping,  527 
spline 

cardinal,  546 
Catmull-Rom,  545-547 
cubic,  546,  547 
cubic  B-,  546,  547,  563 
interpolation,  546 
Splinelnterpolator  (class),  560 
sqr  (method),  84,  154 
sqrt  (method),  84,  768 


square  window,  495 
squared  local  contrast,  398,  402 
sRGB,  81,  82,  305,  350,  352,  353 
ambient  lighting,  345 
grayscale  conversion,  353 
white  point,  345 
stack,  210,  299 

standard  deviation,  54,  275,  614, 
716 

standard  illuminant,  344,  355 
statistical  independence,  756 
step  edge,  370 
structure  matrix,  447 
structuring  element,  184,  188,  202 
sub-pixel  accuracy,  745 
sub-sampling,  623 
SUBTRACT  (constant),  85,  145 
summed  area  table,  51 
super-Gaussian  window,  492,  493 
SVD,  521 
symmetry,  691 
System. out  (constant),  31 

T 

T,  716 
t,  716 

tan  (method),  768 
tangent  function,  769 
target-to-source  mapping,  526,  530 
Taylor  expansion,  633,  740 
multi-dimensional,  740 
template  matching,  565,  566 
temporal  sampling,  7 
TGA  format,  19 
thin  (method),  200,  208 
thin  lens,  6 
thinning,  194-195 
thinOnce  (method),  200 
three-point  mapping,  516 
threshold,  59,  132,  169 
threshold  (method),  59,  288 
threshold  surface,  288 
Thresholder  (class),  284 
thresholding,  131,  253-289 
Bernsen,  274-275 
color  image,  289 
global,  253-272 
hysteresis,  134 
Isodata,  258-260 
local  adaptive,  273-284 
maximum  entropy,  263-266 
minimum  error,  266-272 
Niblack,  275-279 
Otsu,  260-263 
shape-based,  255 
statistical,  255 
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Suvola,  279 

TIFF,  12,  16,  18,  20,  26,  226,  299 
time  unit,  455 
toArray  (method),  157,  727 
toCIEXYZ  (method),  358-361 
toDegrees  (method),  768 
Tomita-Tsuji  filter,  417 
topological  property,  244 
toRadians  (method),  768 
toRGB  (method),  364 
total  variance,  418,  751 
trace,  419,  443,  444,  716,  737,  738, 
751,  752 

tracking,  147,  607,  664 
transform  pair,  459 
Transform J  (package),  531 
Translation  (class),  532,  604 
translation,  241,  515,  687 
transparency,  85,  296,  303 
transpose  of  a  matrix,  720 
tree,  210 

triangle  algorithm,  255 
trigonometric  coefficient,  684 
trimmed  aggregate  distance,  385 
tristimulus  value,  344 
true,  716 

true  color  image,  11,  293,  295,  296 
true  colorimage,  14 
truncate  (method),  706,  710 
truncated  spectrum,  672,  673 
truncation,  84 
Tschumperle-Deriche  filter, 
444-448 

TschumperleDericheFilter  (alg.),  448 
TschumperleDericheFilter 

(class),  450 
tuple,  713 
twirl  mapping,  526 
TwirlMapping  (class),  533 
type  cast,  58,  766 

u 

undercolor-removal  function,  322 
uniform  distribution,  54,  64,  66 
union,  717 
unit  square,  525 
unit  vector,  398,  400,  630,  715, 
736,  737 

unlock  (method),  33 
unsharp  masking,  142-146 
UnsharpMask  (class),  145 
unsharpMask  (method),  145 
unsigned  byte  (type),  767 
updateAndDr aw  (method),  30,  35 


V 

variance,  50-51,  53,  256,  275,  414, 
415,  569,  716,  749,  750,  756, 
759,  761 

between  classes,  261 
bias,  750 

fast  calculation,  50 
from  histogram,  50 
local  calculation,  279 
total,  418,  751,  752 
within  class,  261 
variate,  749 
vector,  713,  719-731 
column,  720 

held,  391,  395,  397,  406,  735-739 
image,  12 
length,  720 

median  filter,  378,  389 
norm,  720 
product,  722-723 
row,  720 

unit,  398,  400,  630,  715,  736,  737 
zero,  715 

VectorMedianFilter  (alg.),  381 
VectorMedianFilter  (class),  386, 
389 

VectorMedianFilterSharpen 

(class),  386 
video,  608 
viewing  angle,  345 

w 

Walsh  transform,  510 
warping,  526 

wasCanceled  (method),  88 
wave  number,  472,  482,  487,  504 
wavelet,  510 

website  for  this  book,  34 
weighted  distance,  243 
white  point,  308,  344,  347 
D50,  345,  358 
D65,  345,  351 
windowed  matching,  573 
windowing  function,  490-491 
Bartlett,  492,  494,  495 
cosine2,  494,  495 
elliptical,  492,  493 
Gaussian,  492,  493,  495 
Hanning,  492,  494,  495 
Parzen,  492,  494,  495 
rectangular  pulse,  493 
super-Gaussian,  492,  493 
WMF  format,  12 

x 

XBM/XPM  format,  19 
XOR,  191,  716 
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XYZ 

color  space,  304,  341-346,  371 
scaling,  355 

Y 

YCbCr,  319 
YIQ,  318 
YUV,  317-319 

z 

Z,  716 

zero  vector,  715 

ZIP,  12 
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